opensearch-k8s-operator: Cluster failure after migrating nodepool 'master not discovered or elected yet, an election requires at least 2 nodes with ids'

With a successfully running cluster with 3 master I needed to migrate to a new nodepool (unrelated) However after migration the cluster was in an unusable state a single master was running cluster-master-0 with the following error in the logs

[cluster-masters-0] master not discovered or elected yet, an election requires at least 2 nodes with ids from [LNCCMf06T4-3j8DIibFf2g, Y4A36mMzRNeVtsWy7A2kEw, RLoWaFxKSDy04rOanWh50Q],have discovered [{cluster-masters-0}{Y4A36mMzRNeVtsWy7A2kEw}{s43hlqGfRfq1IaTL7kXAtw}{cluster-masters-0} {10.104.8.14:9300}{dm}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [] from hosts providers and [{cluster-masters-0}{Y4A36mMzRNeVtsWy7A2kEw}{s43hlqGfRfq1IaTL7kXAtw}{cluster-masters-0}{10.104.8.14:9300}{dm}{shard_indexing_pressure_enabled=true}]from last-known cluster state; node term 6, last-accepted version 50 in term 5

I think i need to add a podDisruptionBudget to ensure 2 pods are running while doing a migration of this sort, but I cannot see that option in the cluster config

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (1 by maintainers)

Most upvoted comments

podManagementPolicy field is immutable after statefulset is created.

Ran into this issue. I scaled the operator to zero replicas, then deleted the statefulset with --cascade=orphan, then recreated the statefulset with podManagementPolicy set to Parallel. then once recovered, delete the statefulset again and switch it back to OrderedReady.

@ahmedrshdy

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  general:
    version: 2.2.1
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
  dashboards:
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "1Gi"
         cpu: "500m"
      limits:
         memory: "1Gi"
         cpu: "500m"
  nodePools:
    - component: masters
      replicas: 3
      diskSize: "30Gi"
      NodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"

(with Operator installed via helm chart version 2.0.4) After the cluster was fully formed I ran kubectl delete pod my-cluster-masters-2. The pod was recreated and became healthy. I’m testing this on a local k3d cluster.

podManagementPolicy field is immutable after statefulset is created.

Right, sorry, didn’t think of this.

@ahmedrshdy

The terminated node is re-provisioned as expected but stuck in unready state.

I cannot reproduce this. Deploying a cluster and then deleting a master pod will recreate the pod and after about two minutes the pod was healthy again in my experiments.

@mahdiG

Hi. I’ve been facing this exact problem. It means if the pods are deleted for any reason, opensearch cluster never comes back up again and it’s a big issue. In my experiments, I deleted only one of 3 cluster_manager (master) pods of the statefulset and it deleted other replicas too!! I don’t know if it’s expected but in that case, I can’t have even one node failure or the entire cluster is down forever! It’s impossible to ensure disaster never happens; but it’s important that if it does and all pods go down, they can come back up and cluster becomes functional again 😃

The case with deleting just one pod I could not reproduce. In my experiments the other pods stayed up. But I agree that if all the pods are deleted at the same time the cluster currently does not come back up again.

As this issue is already a mix of discussion I’ve created #289 to track this problem separately.

@wesleyjconnorsesame. In regards to Downsizing not happening with enabled smartscaler: This is likely the same cause as in #227, PR is on the way.

@jinchengsix: I think I understand now what is happening: The statefulsets for the nodepools are configured with a podManagementPolicy of OrderedReady. This means kubernetes waits for the first pod to be ready before starting the second. This is normally a good thing as it helps the operator do operations in a rolling manner while always keeping the cluster in a working state. During initial cluster setup the operator works around the quorum problem by launching a separate bootstrap pod to faciliate that. But all this means that for your case the cluster will not come back up as the first pod never gets ready without being able to form a quorum. Not sure if this is a situation we want to handle in the operator. If you feel so please open a new issue so we have it on the roadmap as a potential enhancement. As a hack to recover a cluster in this situation you can try to manually edit the statefulset (cluster-masters in your case) and temporarily set the podManagementPolicy to Parallel.

@swoehrl-mw hi , I got the same error, not sure if it’s the the same, what I did is as follow

  1. Deploy the cluster normally with examples/opensearch-cluster.yaml
  2. All resources are Running fine
  3. Delete the statefulset [cluster-masters]
  4. Pod cluster-masters-0 is being recreated but not Ready , no matter how long I wait
  5. kubectl -n opensearch logs -f cluster-masters-0 , I can see the error as above

opensearch-cluster.yaml

kind: OpenSearchCluster
metadata:
  name: cluster
spec:
  general:
    version: 1.3.0
    httpPort: 9200
    vendor: opensearch
    serviceName: cluster
    pluginsList: ["repository-s3"," https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/1.3.0.0/prometheus-exporter-1.3.0.0.zip"]
  dashboards:
    version: 1.3.0
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "1Gi"
         cpu: "500m"
      limits:
         memory: "1Gi"
         cpu: "500m"
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: "50Gi"
      NodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "1Gi"
            cpu: "500m"
      roles:
        - "master"
    - component: nodes
      replicas: 3
      diskSize: "80Gi"
      NodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "1Gi"
            cpu: "500m"
      roles:
        - "data"
    - component: ingest
      replicas: 3
      diskSize: "30Gi"
      NodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "1Gi"
            cpu: "500m"
      roles:
        - "ingest"

screenshot df45133172244acb31fcf92183e8fca image