rook: Deployment failure - First mon pod gets into Init:CrashLoopBackOff

Description:

During rook-ceph deployment, mon deployment is failing with following error message:

Events:
  Type     Reason     Age                  From                                                 Message
  ----     ------     ----                 ----                                                 -------
  Normal   Scheduled  26m                  default-scheduler                                    Successfully assigned rook-ceph/rook-ceph-mon-a-7bcd58b586-4mbjh to ip-10-0-175-110.us-east-2.compute.internal
  Normal   Pulled     25m (x5 over 26m)    kubelet, ip-10-0-175-110.us-east-2.compute.internal  Container image "ceph/daemon-base:latest-nautilus-devel" already present on machine
  Normal   Created    25m (x5 over 26m)    kubelet, ip-10-0-175-110.us-east-2.compute.internal  Created container chown-container-data-dir
  Normal   Started    25m (x5 over 26m)    kubelet, ip-10-0-175-110.us-east-2.compute.internal  Started container chown-container-data-dir
  Warning  BackOff    88s (x117 over 26m)  kubelet, ip-10-0-175-110.us-east-2.compute.internal  Back-off restarting failed container

$ oc get pods -n rook-ceph
NAME                                  READY   STATUS                  RESTARTS   AGE
csi-cephfsplugin-4qmng                2/2     Running                 0          27m
csi-cephfsplugin-dqvb6                2/2     Running                 0          27m
csi-cephfsplugin-provisioner-0        3/3     Running                 0          27m
csi-cephfsplugin-zt4np                2/2     Running                 0          27m
csi-rbdplugin-8zb78                   2/2     Running                 0          27m
csi-rbdplugin-9k4kv                   2/2     Running                 0          27m
csi-rbdplugin-mxsmn                   2/2     Running                 0          27m
csi-rbdplugin-provisioner-0           4/4     Running                 0          27m
rook-ceph-agent-pkm4p                 1/1     Running                 0          27m
rook-ceph-agent-rlmzf                 1/1     Running                 0          27m
rook-ceph-agent-s7xtz                 1/1     Running                 0          27m
rook-ceph-mon-a-7bcd58b586-4mbjh      0/1     Init:CrashLoopBackOff   10         27m
rook-ceph-operator-5c6fd4b7db-x4qpq   1/1     Running                 0          28m
rook-discover-dktsz                   1/1     Running                 0          28m
rook-discover-hd6bk                   1/1     Running                 0          28m
rook-discover-hrps6                   1/1     Running                 0          28m

Environment:

  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS 410.8.20190627.0 (Ootpa)

  • Kernel (e.g. uname -a): Linux ip-10-0-146-110 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

*** Cloud provider or hardware configuration:**

  • Rook version (use rook version inside of a Rook Pod): rook: v1.0.0-268.g3f67e8f
$ oc describe pod rook-ceph-operator-5c6fd4b7db-x4qpq -n rook-ceph |grep -i image
    Image:         rook/ceph:master
    Image ID:      docker.io/rook/ceph@sha256:fadae8cbe112f2779f24cbe88430c730b965b249f2f5bf427602bce61926bbf8
      ROOK_CSI_CEPH_IMAGE:                quay.io/cephcsi/cephcsi:canary
      ROOK_CSI_REGISTRAR_IMAGE:           quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
      ROOK_CSI_PROVISIONER_IMAGE:         quay.io/k8scsi/csi-provisioner:v1.2.0
      ROOK_CSI_SNAPSHOTTER_IMAGE:         quay.io/k8scsi/csi-snapshotter:v1.1.0
      ROOK_CSI_ATTACHER_IMAGE:            quay.io/k8scsi/csi-attacher:v1.1.1
  • Storage backend version (e.g. for ceph do ceph -v): ceph/daemon-base:latest-nautilus-devel
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.4 True False 74m Cluster version is 4.1.4

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Found the fix. Need to change the ROOK_HOSTPATH_REQUIRES_PRIVILEGED value from false to true in the operator.yaml.

@leseb thank you for such a quick turnaround. cheers !

@leseb I see

$ oc logs rook-ceph-mon-a-65b9d47854-wjqdw -c chown-container-data-dir                                                  
failed to change ownership of '/var/lib/ceph/mon/ceph-a' from root:root to ceph:ceph
chown: changing ownership of '/var/lib/ceph/mon/ceph-a': Permission denied