opensearch-k8s-operator: Opensearch operator 2.0.4 does not work with the example opensearch cluster manifest.

Description

The example opensearch cluster yaml does not work with version 2.0.4 of the operator.

Steps to reproduce

  1. Add the helm repo for the operator: helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
  2. Install V2.0.4 of the operator: helm install opensearch-operator opensearch-operator/opensearch-operator --version 2.0.4
  3. Wait for the operator pods to be ready. image
  4. Apply the manifest for the example opensearch cluster: kubectl apply -f ~/Git/opensearch-k8s-operator/opensearch-operator/examples/opensearch-cluster.yaml
  5. After ~10 mins check the cluster: image
  6. In this instance the master and one coordinator node are restarting.
  7. Describe master pod to debug: kubectl describe pod my-cluster-masters-0 image my-cluster-masters-0.log
  8. Describe coordinator pod to debug: kubectl describe pod my-cluster-coordinators-1 image my-cluster-coordinators-1.log
  9. Terminated with 137 for both.

Steps to workaround

  1. I have tried doubling memory requests and limits to 4Gi for each node group but I am still getting 137 error and pods killed. Any advice on this would be greatly appreciated - can you supply a working example yaml for the latest images of opensearch also? I think 1.3.0 is 5 months old now. Thanks.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 38 (3 by maintainers)

Most upvoted comments

I got it working!

Current Versions:

  • Helm Chart 2.0.3
  • Opensearch 2.2.0
  • K8s v1.23.7-gke.1400

Steps:

  • Use minimal OpenSearchCluster manifest for Version 1.3.2
  • Once cluster is running change version to 2.2.0
  • Profit

What i noticed is, that when deploying Version 1.3.1 or 1.3.2 there is an additional bootstrapping pod, this pod is not present with higher Opensearch versions.

NAME                                     READY   STATUS    RESTARTS   AGE
opensearch-bootstrap-0                   1/1     Running   0          4m18s
opensearch-dashboards-7b65c8ff45-zz7fv   1/1     Running   0          4m18s
opensearch-masters-0                     1/1     Running   0          4m18s
opensearch-masters-1                     1/1     Running   0          2m37s
opensearch-masters-2                     0/1     Pending   0          57s

So I guess as there is no pod doing the bootstrapping, the cluster creation fails altogether, unless you are upgrading from a previous version, where there is a dedicated bootstrapping pod.

Attachments: Minimal cluster manifest: minimal-cluster.yaml

Sure re-opening this issue, the example opensearch-cluster.yaml works fine, we should be adding a new example for 2.x (>2.0.0).

Hey All, just FYI i’m able to get the cluster up and running with the following yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config: 
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "1Gi"
            cpu: "550m"
         limits:
            memory: "1Gi"
            cpu: "550m"
      roles:
        - "data"
       # - "master", since version > 2.0.0 use cluster_manager
        - "cluster_manager"
      persistence:
         emptyDir: {}

I believe there is some confusion with roles passing - "cluster_manager" for above 2.0.0 should work fine.

@dickescheid thanks for all the help time and input also early on!

Hey Thanks for the update @dobharweim and for all who participated here 😃, closing this issue, please feel free to reopen if required. We should have some details about the change cluster_manager reflecting in README docs. @idanl21 @segalziv @swoehrl-mw Thank you

Hey All, just FYI i’m to get the cluster up and running with the following yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config: 
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "1Gi"
            cpu: "550m"
         limits:
            memory: "1Gi"
            cpu: "550m"
      roles:
        - "data"
       # - "master", since version > 2.0.0 use cluster_manager
        - "cluster_manager"
      persistence:
         emptyDir: {}

I believe there is some confusion with roles passing - "cluster_manager" for above 2.0.0 should work fine.

@prudhvigodithi I can confirm (gladly) that the above works. Thanks.

P.S. - I did initially see the second master pod getting killed with a 137 error again, so I increased resources:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config:
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    tls:
      enable: true
      generate: true
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "4Gi"
            cpu: "1000m"
         limits:
            memory: "4Gi"
            cpu: "1000m"
      roles:
        - "data"
        - "cluster_manager"
      persistence:
         emptyDir: {}

Oh, that’s right! I’ll check tomorrow this. Indeed as on example it was master, and it supported both 1.x and 2.x, I assumed Operator used the CR to generate a perfect fit for each version.

I looked over source for other thing and missed this out.

I concluded something similar in #251

nice, thanks. Good to know we got it working, bummer I didn’t find it, would have saved me a lot of work.

Then we’ll have to wait for a fix.

Steps:

  • Use minimal OpenSearchCluster manifest for Version 1.3.2
  • Once cluster is running change version to 2.2.0

I concluded something similar in #251

I’m running opensearch-operator 2.0.4 tries removing the operator and applying it again as well. I destroyed the properly running cluster and tried recreating that, it also failed. Currently I am only getting the bootstrapped issues.

The only thing which changed between the time the opensearch cluster was working and now is the kubernetes cluster version. I updated from version of v1.22 to currently 1.23.7-gke.1400 in the last days.

My kubernetes cluster is GKE on Google.