rook: CSI plugin registration failing and pods stuck in ContainerCreating

  • Bug Report

Deviation from expected behavior: I have these messages logged every 2 minutes by the kubelet service:

Nov 19 09:35:12 fpig-kubeletl022 kubelet[82977]: E1119 09:35:12.385237   82977 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock" failed. No retries permitted until 2019-11-19 09:37:14.385220494 -0600 CST m=+639.587100249 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration"
Nov 19 09:35:12 fpig-kubeletl022 kubelet[82977]: E1119 09:35:12.385343   82977 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock" failed. No retries permitted until 2019-11-19 09:37:14.385321555 -0600 CST m=+639.587201325 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration

Any pods that I attach a rook block PVC to are stuck in ContainerCreating and I see the following messages:

Nov 19 09:55:15 fpig-kubeletl042 kubelet[55855]: E1119 09:55:15.298963   55855 driver-call.go:274] mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume replicapool/pvc-65b12ba9-eb87-496a-acd3-46144bd35144: failed to map image replicapool/pvc-65b12ba9-eb87-496a-acd3-46144bd35144 cluster rook-ceph. failed to map image replicapool/pvc-65b12ba9-eb87-496a-acd3-46144bd35144, output: , err: Failed to complete 'rbd': signal: interrupt.
Nov 19 09:55:15 fpig-kubeletl042 kubelet[55855]: W1119 09:55:15.298978   55855 driver-call.go:150] FlexVolume: driver call failed: executable: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/rook.io~rook-ceph/rook-ceph, args: [mount /var/lib/kubelet/pods/bbd911c1-04c3-4ddc-a2c0-ea694159d795/volumes/rook.io~rook-ceph/pvc-65b12ba9-eb87-496a-acd3-46144bd35144 {"clusterNamespace":"rook-ceph","dataBlockPool":"","image":"pvc-65b12ba9-eb87-496a-acd3-46144bd35144","kubernetes.io/fsType":"","kubernetes.io/pod.name":"test","kubernetes.io/pod.namespace":"default","kubernetes.io/pod.uid":"bbd911c1-04c3-4ddc-a2c0-ea694159d795","kubernetes.io/pvOrVolumeName":"pvc-65b12ba9-eb87-496a-acd3-46144bd35144","kubernetes.io/readwrite":"rw","kubernetes.io/serviceAccount.name":"default","pool":"replicapool","storageClass":"rook-block"}], error: exit status 1, output: "{\"status\":\"Failure\",\"message\":\"Rook: Mount volume failed: failed to attach volume replicapool/pvc-65b12ba9-eb87-496a-acd3-46144bd35144: failed to map image replicapool/pvc-65b12ba9-eb87-496a-acd3-46144bd35144 cl

Expected behavior: No error messages and pods successfully mounting and running rook block volumes.

Environment:

  • OS (e.g. from /etc/os-release): RHEL 7.5
  • Kernel (e.g. uname -a): Linux 3.10.0-862.3.2.el7.x86_64
  • Cloud provider or hardware configuration: on-prem vanilla k8s
  • Rook version (use rook version inside of a Rook Pod): rook: v1.1.2
  • Storage backend version (e.g. for ceph do ceph -v): ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
  • Kubernetes version (use kubectl version): v1.16.2
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): vanilla k8s
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
[root@rook-ceph-tools /]# ceph health
HEALTH_WARN BlueFS spillover detected on 25 OSD(s)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 21 (8 by maintainers)

Most upvoted comments

The same issue

Thanks a lot for the help @Madhu-1.

We managed to fix this by switching our storageclass to use the CSI driver. We also had a networking issue, we had a networkpolicy to allow traffic to the rook namespace, but the rbdplugin pod uses host networking so that didn’t apply.

everything looks fine, can you paste the root-dir from kubelet configuration