rook: CSI based RBD pods go into CrashLoopBackOff upon deployment
Description
The recent changes in ceph/csi/rbd/templates folder and addition of ceph-csi-config (PR#3271) seems to be affecting rook-CSI deployment. The csi based rbd pods are unable to come up in Running state upon fresh deployment.
Steps performed for creating the setup:
-
Cloned the rook repo
-
Navigated to rook/cluster/examples/kubernetes/ceph
-
Ran common.yaml $oc create -f common.yaml
-
Applied rbac rules from csi folder $oc apply -f csi/rbac/rbd/ $oc apply -f csi/rbac/cephfs/
-
Created operator pod $ oc create -f operator-openshift-with-csi.yaml
-
Check the pods. The pods are either stuck in ContainerCreating state.
-
Create ceph pods. $ oc create -f cluster.yaml
-
Re-check the pod status. The csi-rbdplugin-* pods go in CrashloopBackOff and finally into error state.
Note: Tried similar steps in ocs-ci based setups as well, similar issue
Till cluster.yaml is run to create ceph cluster, the csi-rbplugin pods were in ContainerCreating state:
$ oc get pods -n rook-ceph csi-rbdplugin-f999c 0/2 ContainerCreating 0 53m 10.0.158.162 ip-10-0-158-162.us-east-2.compute.internal <none> <none> csi-rbdplugin-pkzx7 0/2 ContainerCreating 0 53m 10.0.168.120 ip-10-0-168-120.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-0 0/4 ContainerCreating 0 53m <none> ip-10-0-158-162.us-east-2.compute.internal <none> <none> csi-rbdplugin-st5jp 0/2 ContainerCreating 0 53m 10.0.135.141 ip-10-0-135-141.us-east-2.compute.internal <none> <none>
Pods in error state:
csi-rbdplugin-f999c 1/2 CrashLoopBackOff 4 66m
csi-rbdplugin-pkzx7 1/2 CrashLoopBackOff 4 66m
csi-rbdplugin-provisioner-0 3/4 CrashLoopBackOff 5 66m
csi-rbdplugin-st5jp 1/2 CrashLoopBackOff 4 66m
oc describe of one pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45m default-scheduler Successfully assigned rook-ceph/csi-rbdplugin-provisioner-0 to ip-10-0-158-162.us-east-2.compute.internal
**Warning FailedMount 15m (x23 over 45m) kubelet, ip-10-0-158-162.us-east-2.compute.internal MountVolume.SetUp failed for volume "ceph-csi-config" : configmaps "rook-ceph-mon-endpoints" not found**
**Warning FailedMount 5m26s (x18 over 43m) kubelet, ip-10-0-158-162.us-east-2.compute.internal Unable to mount volumes for pod "csi-rbdplugin-provisioner-0_rook-ceph(49342068-90c4-11e9-906d-0ac7385058d4)": timeout expired waiting for volumes to attach or mount for pod "rook-ceph"/"csi-rbdplugin-provisioner-0". list of unmounted volumes=[ceph-csi-config]. list of unattached volumes=[host-dev host-rootfs host-sys lib-modules socket-dir ceph-csi-config rook-csi-rbd-provisioner-sa-token-htkg5]**
$ oc get cm -n rook-ceph
NAME DATA AGE
local-device-ip-10-0-135-141.us-east-2.compute.internal 1 68m
local-device-ip-10-0-158-162.us-east-2.compute.internal 1 69m
local-device-ip-10-0-168-120.us-east-2.compute.internal 1 68m
rook-ceph-config 1 4m36s
rook-ceph-mon-endpoints 4 4m36s
rook-config-override 1 4m36s
rook-crush-config 1 3m39s
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-129-19.us-east-2.compute.internal Ready master 66m v1.13.4+cb455d664
ip-10-0-135-141.us-east-2.compute.internal Ready worker 62m v1.13.4+cb455d664
ip-10-0-149-108.us-east-2.compute.internal Ready master 66m v1.13.4+cb455d664
ip-10-0-158-162.us-east-2.compute.internal Ready worker 62m v1.13.4+cb455d664
ip-10-0-168-120.us-east-2.compute.internal Ready worker 62m v1.13.4+cb455d664
ip-10-0-173-77.us-east-2.compute.internal Ready master 66m v1.13.4+cb455d664
oc get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-fj7w6 2/2 Running 1 70m
csi-cephfsplugin-lsc7f 2/2 Running 0 70m
csi-cephfsplugin-provisioner-0 3/3 Running 0 70m
csi-cephfsplugin-xz5wx 2/2 Running 0 70m
csi-rbdplugin-f999c 0/2 CrashLoopBackOff 8 70m
csi-rbdplugin-pkzx7 0/2 CrashLoopBackOff 8 70m
csi-rbdplugin-provisioner-0 2/4 CrashLoopBackOff 12 70m
csi-rbdplugin-st5jp 0/2 CrashLoopBackOff 8 70m
rook-ceph-agent-b5wdl 1/1 Running 0 70m
rook-ceph-agent-w62wh 1/1 Running 0 70m
rook-ceph-agent-zbjkg 1/1 Running 0 70m
rook-ceph-mgr-a-7c9b5494dd-8lpbw 1/1 Running 0 4m43s
rook-ceph-mon-a-84bb86696c-t4hx4 1/1 Running 0 5m41s
rook-ceph-mon-b-79db689db5-wsmwt 1/1 Running 0 5m29s
rook-ceph-mon-c-5c995b74bb-5jrp9 1/1 Running 0 5m6s
rook-ceph-operator-5b6856f864-bm2rq 1/1 Running 0 71m
rook-ceph-osd-0-7d59d5b7b5-bghhd 1/1 Running 0 3m24s
rook-ceph-osd-1-677fdb8f9f-q77v5 1/1 Running 0 3m19s
rook-ceph-osd-2-78f5f56fd5-lfjgz 1/1 Running 0 3m14s
rook-ceph-osd-prepare-ip-10-0-135-141-8cj6b 0/2 Completed 0 4m12s
rook-ceph-osd-prepare-ip-10-0-158-162-c5qz9 0/2 Completed 0 4m12s
rook-ceph-osd-prepare-ip-10-0-168-120-5vspd 0/2 Completed 0 4m12s
rook-discover-c7qbb 1/1 Running 0 70m
rook-discover-dg4ck 1/1 Running 0 70m
rook-discover-g75qd 1/1 Running 0 70m
Deviation from expected behavior: The recent changes should not affect the deployment of csi-rbdplugin pods.
Expected behavior: All csi-rbdplugin pods should come up in Running state.
Environment:
-
OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)
-
Kernel (e.g.
uname -a): 4.18.0-80.1.2.el8_0.x86_64 -
Cloud provider or hardware configuration: AWS
-
Rook version (use
rook versioninside of a Rook Pod): rook: v1.0.0-122.g488fe64 -
Openshift version (use
kubectl version):
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0-201905091432+4910781-dirty", GitCommit:"4910781", GitTreeState:"dirty", BuildDate:"2019-05-09T19:19:42Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+838b4fa", GitCommit:"838b4fa", GitTreeState:"clean", BuildDate:"2019-05-19T23:51:04Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Openshift
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.1.0 True False 83m Cluster version is 4.1.0
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox): ceph health HEALTH_OK
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (11 by maintainers)
Commits related to this issue
- Update openshift with Ceph-CSI to image versions as in Kubernetes case PR #3217 changed the pod manifest to drop some parameters to the ceph-csi pods. This also resulted in a change to the operator w... — committed to ShyamsundarR/rook by ShyamsundarR 5 years ago
- Update openshift with Ceph-CSI to image versions as in Kubernetes case PR #3217 changed the pod manifest to drop some parameters to the ceph-csi pods. This also resulted in a change to the operator w... — committed to ShyamsundarR/rook by ShyamsundarR 5 years ago
- Need CSI RBD image added to example The operator-with-csi.yaml example now needs the updated CSI RBD image specified explicitly, otherwise the old image is looking for the metadata arg and fails to s... — committed to j-griffith/rook by j-griffith 5 years ago
- Need CSI RBD image added to example The operator-with-csi.yaml example now needs the updated CSI RBD image specified explicitly, otherwise the old image is looking for the metadata arg and fails to s... — committed to j-griffith/rook by j-griffith 5 years ago
@j-griffith hope your issue is resolved and we can close this one
closing this as its fixed
The case seems to be due to missing configmap
ceph-csi-configwhen bringing up the Ceph-CSI related pods. @phlogistonjohn the step that rusoc create -f operator-openshift-with-csi.yamlcan create an empty config file for the pods to start successfully, is that feasible?