rook: csi-rbdplugin missing after node failure

Is this a bug report or feature request?

Bug Report After a worker node fails there isn’t any csi-rbdplugin pod anymore. How can I recreate them? For details see this output:

NAME                                                   READY   STATUS      RESTARTS   AGE
rook-ceph-crashcollector-vsrvk8w001-58b685f754-4862x   1/1     Running     0          7d8h
rook-ceph-crashcollector-vsrvk8w002-56f7cf79c-dp5g7    1/1     Running     0          7d8h
rook-ceph-crashcollector-vsrvk8w003-b6bc5db68-lmdqp    1/1     Running     0          33h
rook-ceph-mgr-a-5bcf49455f-kbpc5                       1/1     Running     0          7d8h
rook-ceph-mon-a-cfd699798-czw7b                        1/1     Running     0          7d8h
rook-ceph-mon-b-6c765d57d7-wcm5h                       1/1     Running     0          33h
rook-ceph-mon-c-549dc995fc-b64xf                       1/1     Running     0          7d8h
rook-ceph-mon-d-canary-5867bbd5c7-rvt9b                0/1     Pending     0          8m53s
rook-ceph-operator-6d8fb9498b-r9czf                    1/1     Running     1          3d5h
rook-ceph-osd-0-9445b78c-7kbgv                         1/1     Running     0          7d8h
rook-ceph-osd-1-85d485d67c-ftchf                       1/1     Running     0          7d8h
rook-ceph-osd-2-b4d7b8995-2lctw                        1/1     Running     0          33h
rook-ceph-osd-prepare-vsrvk8w001-2w827                 0/1     Completed   0          8h
rook-ceph-osd-prepare-vsrvk8w002-qsrlh                 0/1     Completed   0          8h
rook-ceph-tools-685d84df94-sr2xn                       1/1     Running     0          8m7s
rook-discover-5nrtr                                    1/1     Running     0          7d8h
rook-discover-cvsx2                                    1/1     Running     1          7d8h
rook-discover-gqfhw                                    1/1     Running     0          7d8h

Deviation from expected behavior:

Expected behavior: There should be the csi plugins.

How to reproduce it (minimal and precise): One worker runs out of memory so it stopped some pods.

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary
Operator’s logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read Github documentation if you need help.

Environment:

OS (e.g. from /etc/os-release): Debian GNU/Linux 10 (buster)
Kernel (e.g. uname -a): 4.19.0-6-amd64
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): 1.2.1
Storage backend version (e.g. for ceph do ceph -v):14.2.5
Kubernetes version (use kubectl version):1.16.4
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_WARN 2 slow ops, oldest one blocked for 116618 sec, mon.a has slow ops; too few PGs per OSD (8 < min 30); 1/3 mons down, quorum a,c

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 16 (7 by maintainers)

Most upvoted comments

if you restart operator pod the csi pods should get recreated. can you paste kubectl cm -n<rook-namespace> output?

Madhu-1 on Jan 22, 2020