cortex: the ring never removes old ingester even if the ingester pod is evicted

I have a similar problem as #1502 when my ingester pod was evicted , a new ingester pod will be created . now the ring has two ingester, but only one (the new one) is healthy. the old one will not be removed from the ring, even if I delete the evict pod manually. the ring information as follows:

`

                                        <tr>
						<td>ingester-7fc8759d7f-nzb6g</td>
						<td>ACTIVE</td>
						<td>172.16.0.62:9095</td>
						<td>2019-07-19 03:33:32 &#43;0000 UTC</td>
						<td>128</td>
						<td>45.739077787319914%</td>
						<td><button name="forget" value="ingester-7fc8759d7f-nzb6g" type="submit">Forget</button></td>
					</tr>
					
					<tr>
						<td>ingester-7fc8759d7f-wmnms</td>
						<td>Unhealthy</td>
						<td>172.16.0.93:9095</td>
						<td>2019-07-18 14:46:18 &#43;0000 UTC</td>
						<td>128</td>
						<td>54.260922212680086%</td>
						<td><button name="forget" value="ingester-7fc8759d7f-wmnms" type="submit">Forget</button></td>
					</tr>

` and the ingester’s status is always unready, with distributor’s error

level=warn ts=2019-07-19T03:41:45.413839063Z caller=server.go:1995 traceID=daf4028f530860f msg="POST /api/prom/push (500) 727.847µs Response: \"at least 1 live ingesters required, could only find 0\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 3742; Content-Type: application/x-protobuf; User-Agent: Prometheus/2.11.0; X-Forwarded-For: 172.16.0.17; X-Forwarded-Host: perf.monitorefk.huawei.com; X-Forwarded-Port: 443; X-Forwarded-Proto: https; X-Original-Uri: /api/prom/push; X-Prometheus-Remote-Write-Version: 0.1.0; X-Real-Ip: 172.16.0.17; X-Request-Id: 62a470dc6de7a83c8974e3411fa63e40; X-Scheme: https; X-Scope-Orgid: custom; "

I wonder if there is any solution to deal with the situaton automatically? maybe to check the replicas-refactor and remove unhealthy excess ingesters from the ring?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 8
  • Comments: 45 (18 by maintainers)

Commits related to this issue

Most upvoted comments

The ingester.autoforget_unhealthy configuration item exists in Loki since this pull request was merged https://github.com/grafana/loki/pull/3919.

Would it be possible to add the same functionality into Cortex?

Or is there another way to facilitate the same behaviour as Loki’s ingester.autoforget_unhealthy?

I would be happy to take a stab at writing ingester.autoforget_unhealthy based on the loki implementation if the maintainers think it makes sense.

We ended up adding these Kubernetes resources for an automatic cleanup of unhealthy ingesters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ingester-cleanup-script
  namespace: cortex
data:
  script: |
    while true; do
      which curl > /dev/null 2>&1
      if [ $? -eq 1 ]; then
        apk add curl
      fi
      which jq > /dev/null 2>&1
      if [ $? -eq 1 ]; then
        apk add jq
      fi

      curl -H "Accept: application/json" http://cortex-distributor:8080/ingester/ring | 
        jq ".shards[] | select(.state==\"Unhealthy\") | .id" |
        sed 's|"||g' |
        xargs -I{} curl -d "forget={}" -d 'csrf_token=$__CSRF_TOKEN_PLACEHOLDER__' -H "Accept: application/json" http://cortex-distributor:8080/ingester/ring
      
      sleep 3
    done
    true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cortex-ingester-cleanup
  namespace: cortex
  labels:
    app: cortex-ingester-cleanup
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cortex-ingester-cleanup
  template:
    metadata:
      labels:
        app: cortex-ingester-cleanup
        revision: '1'
    spec:
      containers:
        - name: cortex-ingester-cleanup
          image: alpine
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
          command:
            - sh
            - -c
            - "apk add bash && exec bash /cortex-ingester-cleanup.sh"
          volumeMounts:
            - name: cortex-ingester-cleanup-script
              mountPath: /cortex-ingester-cleanup.sh
              subPath: script
      volumes:
        - name: cortex-ingester-cleanup-script
          configMap:
            name: cortex-ingester-cleanup-script

Hi – FYI I’ve found this /ready behavior plays badly with StatefulSets using an “Ordered” Pod Management Policy (the default). I believe the fix is easy – use a “Parallel” policy – but documenting the problematic scenario:

Suppose you have 3 SS replicas with “Ordered” policy:

  • pod-0, pod-1, pod2 are all running
  • pod-1 & pod-2 have the power yanked at (approximately) the same time
  • pod-1 is re-started by the SS replica controller
  • pod-0 is marked as unhealthy, because it can’t talk to pod-2
  • pod-1 becomes healthy
  • The replica controller is wedged because pod-0 is still unhealthy. pod-2 is never started

I experienced this running with preemptible nodes (I know, I know) and confirmed with manual testing. If the “Parallel” policy is used instead then pod-1 & pod-2 start in parallel and pick up their former places in the ring.

In a Kubernetes & Helm based scenario, these Helm values could be a workaround:

ingester:
  initContainers:
    - name: cleanup-unhealthy-ingesters
      image: alpine
      command:
        - sh
        - -c
        - 'apk add curl jq && curl -H "Accept: application/json" http://cortex-distributor:8080/ingester/ring | jq ".shards[] | select(.state==\"UNHEALTHY\") | .id" | xargs -I{} curl -d "forget={}" -H "Accept: application/json" http://cortex-distributor:8080/ingester/ring'

Please be aware that you need to change the two urls in conformance to your Helm release name. Here it is cortex, so the url is http://{{ .Release.Name }}-distributor:8080/ingester/ring. Please test thoroughly and contribute your enhancements.

whether there is a way to have the ingester ring self-heal in case of unclean shutdowns.

Nobody has coded one for Cortex, to my knowledge.

deploying to AWS with spot instances

We tell you not to do this in the docs.

ingester.autoforget_unhealthy will be amazing as deploying to AWS with spot instances, get ingesters destroyed and re span up. Exposing the Cortex Ring Status web interface to manually remove unhealthy ingesters is not practical , and it is a security concern.

Got bitten bit this terribly several times now, and lost a lot of time and data 😦, would really love to see ingester.autoforget_unhealthy support in Cortex.

I’ve read through this issue and the linked issues, and it’s still unclear to me whether there is a way to have the ingester ring self-heal in case of unclean shutdowns. Not needing human operator intervention would be extremely valuable to us, as we are losing much more data due to ingesters being down compared to what we would lose by auto-forgetting unhealthy ingesters from the ring.

Now that chunks storage is deprecated and we use blocks storage, we no longer “hand-over” from one ingester to another. So one justification for this behaviour has disappeared.

Happy to hear experience reports from people who did automate it.