rook: CSI based RBD pods go into CrashLoopBackOff upon deployment

Description

The recent changes in ceph/csi/rbd/templates folder and addition of ceph-csi-config (PR#3271) seems to be affecting rook-CSI deployment. The csi based rbd pods are unable to come up in Running state upon fresh deployment.

Steps performed for creating the setup:

  1. Cloned the rook repo

  2. Navigated to rook/cluster/examples/kubernetes/ceph

  3. Ran common.yaml $oc create -f common.yaml

  4. Applied rbac rules from csi folder $oc apply -f csi/rbac/rbd/ $oc apply -f csi/rbac/cephfs/

  5. Created operator pod $ oc create -f operator-openshift-with-csi.yaml

  6. Check the pods. The pods are either stuck in ContainerCreating state.

  7. Create ceph pods. $ oc create -f cluster.yaml

  8. Re-check the pod status. The csi-rbdplugin-* pods go in CrashloopBackOff and finally into error state.


Note: Tried similar steps in ocs-ci based setups as well, similar issue

Till cluster.yaml is run to create ceph cluster, the csi-rbplugin pods were in ContainerCreating state:

$ oc get pods -n rook-ceph csi-rbdplugin-f999c 0/2 ContainerCreating 0 53m 10.0.158.162 ip-10-0-158-162.us-east-2.compute.internal <none> <none> csi-rbdplugin-pkzx7 0/2 ContainerCreating 0 53m 10.0.168.120 ip-10-0-168-120.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-0 0/4 ContainerCreating 0 53m <none> ip-10-0-158-162.us-east-2.compute.internal <none> <none> csi-rbdplugin-st5jp 0/2 ContainerCreating 0 53m 10.0.135.141 ip-10-0-135-141.us-east-2.compute.internal <none> <none>

Pods in error state:

csi-rbdplugin-f999c                           1/2     CrashLoopBackOff    4          66m
csi-rbdplugin-pkzx7                           1/2     CrashLoopBackOff    4          66m
csi-rbdplugin-provisioner-0                   3/4     CrashLoopBackOff    5          66m
csi-rbdplugin-st5jp                           1/2     CrashLoopBackOff    4          66m

oc describe of one pod

Events:
  Type     Reason       Age                   From                                                 Message
  ----     ------       ----                  ----                                                 -------
  Normal   Scheduled    45m                   default-scheduler                                    Successfully assigned rook-ceph/csi-rbdplugin-provisioner-0 to ip-10-0-158-162.us-east-2.compute.internal
  **Warning  FailedMount  15m (x23 over 45m)    kubelet, ip-10-0-158-162.us-east-2.compute.internal  MountVolume.SetUp failed for volume "ceph-csi-config" : configmaps "rook-ceph-mon-endpoints" not found**

  **Warning  FailedMount  5m26s (x18 over 43m)  kubelet, ip-10-0-158-162.us-east-2.compute.internal  Unable to mount volumes for pod "csi-rbdplugin-provisioner-0_rook-ceph(49342068-90c4-11e9-906d-0ac7385058d4)": timeout expired waiting for volumes to attach or mount for pod "rook-ceph"/"csi-rbdplugin-provisioner-0". list of unmounted volumes=[ceph-csi-config]. list of unattached volumes=[host-dev host-rootfs host-sys lib-modules socket-dir ceph-csi-config rook-csi-rbd-provisioner-sa-token-htkg5]**

$ oc get cm -n rook-ceph

NAME                                                      DATA   AGE
local-device-ip-10-0-135-141.us-east-2.compute.internal   1      68m
local-device-ip-10-0-158-162.us-east-2.compute.internal   1      69m
local-device-ip-10-0-168-120.us-east-2.compute.internal   1      68m
rook-ceph-config                                          1      4m36s
rook-ceph-mon-endpoints                                   4      4m36s
rook-config-override                                      1      4m36s
rook-crush-config                                         1      3m39s

$ oc get nodes

NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-129-19.us-east-2.compute.internal    Ready    master   66m   v1.13.4+cb455d664
ip-10-0-135-141.us-east-2.compute.internal   Ready    worker   62m   v1.13.4+cb455d664
ip-10-0-149-108.us-east-2.compute.internal   Ready    master   66m   v1.13.4+cb455d664
ip-10-0-158-162.us-east-2.compute.internal   Ready    worker   62m   v1.13.4+cb455d664
ip-10-0-168-120.us-east-2.compute.internal   Ready    worker   62m   v1.13.4+cb455d664
ip-10-0-173-77.us-east-2.compute.internal    Ready    master   66m   v1.13.4+cb455d664

oc get pods -n rook-ceph

NAME                                          READY   STATUS             RESTARTS   AGE
csi-cephfsplugin-fj7w6                        2/2     Running            1          70m
csi-cephfsplugin-lsc7f                        2/2     Running            0          70m
csi-cephfsplugin-provisioner-0                3/3     Running            0          70m
csi-cephfsplugin-xz5wx                        2/2     Running            0          70m
csi-rbdplugin-f999c                           0/2     CrashLoopBackOff   8          70m
csi-rbdplugin-pkzx7                           0/2     CrashLoopBackOff   8          70m
csi-rbdplugin-provisioner-0                   2/4     CrashLoopBackOff   12         70m
csi-rbdplugin-st5jp                           0/2     CrashLoopBackOff   8          70m
rook-ceph-agent-b5wdl                         1/1     Running            0          70m
rook-ceph-agent-w62wh                         1/1     Running            0          70m
rook-ceph-agent-zbjkg                         1/1     Running            0          70m
rook-ceph-mgr-a-7c9b5494dd-8lpbw              1/1     Running            0          4m43s
rook-ceph-mon-a-84bb86696c-t4hx4              1/1     Running            0          5m41s
rook-ceph-mon-b-79db689db5-wsmwt              1/1     Running            0          5m29s
rook-ceph-mon-c-5c995b74bb-5jrp9              1/1     Running            0          5m6s
rook-ceph-operator-5b6856f864-bm2rq           1/1     Running            0          71m
rook-ceph-osd-0-7d59d5b7b5-bghhd              1/1     Running            0          3m24s
rook-ceph-osd-1-677fdb8f9f-q77v5              1/1     Running            0          3m19s
rook-ceph-osd-2-78f5f56fd5-lfjgz              1/1     Running            0          3m14s
rook-ceph-osd-prepare-ip-10-0-135-141-8cj6b   0/2     Completed          0          4m12s
rook-ceph-osd-prepare-ip-10-0-158-162-c5qz9   0/2     Completed          0          4m12s
rook-ceph-osd-prepare-ip-10-0-168-120-5vspd   0/2     Completed          0          4m12s
rook-discover-c7qbb                           1/1     Running            0          70m
rook-discover-dg4ck                           1/1     Running            0          70m
rook-discover-g75qd                           1/1     Running            0          70m

Deviation from expected behavior: The recent changes should not affect the deployment of csi-rbdplugin pods.

Expected behavior: All csi-rbdplugin pods should come up in Running state.

Environment:

  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)

  • Kernel (e.g. uname -a): 4.18.0-80.1.2.el8_0.x86_64

  • Cloud provider or hardware configuration: AWS

  • Rook version (use rook version inside of a Rook Pod): rook: v1.0.0-122.g488fe64

  • Openshift version (use kubectl version):

Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0-201905091432+4910781-dirty", GitCommit:"4910781", GitTreeState:"dirty", BuildDate:"2019-05-09T19:19:42Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+838b4fa", GitCommit:"838b4fa", GitTreeState:"clean", BuildDate:"2019-05-19T23:51:04Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Openshift

oc get clusterversion

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0     True        False         83m     Cluster version is 4.1.0
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): ceph health HEALTH_OK

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@j-griffith hope your issue is resolved and we can close this one

closing this as its fixed

The case seems to be due to missing configmap ceph-csi-config when bringing up the Ceph-CSI related pods. @phlogistonjohn the step that rus oc create -f operator-openshift-with-csi.yaml can create an empty config file for the pods to start successfully, is that feasible?