rook: rook-ceph-osd pods in CrashLoopBackOff

Is this a bug report or feature request?

Bug Report When I reinstall rook-ceph cluster in my k8s with this doc： https://rook.github.io/docs/rook/v1.0/ceph-quickstart.html, the rook-ceph-osd-* pods are in CrashLoopBackOff state:

NAME                                  READY   STATUS             RESTARTS   AGE   IP               NODE
rook-ceph-agent-j6s9c                 1/1     Running            0          2h    192.168.11.102   t102
rook-ceph-agent-jkrvb                 1/1     Running            0          2h    192.168.11.101   t101
rook-ceph-agent-r6tjc                 1/1     Running            0          2h    192.168.11.103   t103
rook-ceph-mgr-a-5855fc9dc6-4vcxw      1/1     Running            0          59m   10.244.1.27      t102
rook-ceph-mon-a-79896cfbb7-nnkbb      1/1     Running            0          59m   10.244.2.41      t101
rook-ceph-mon-b-767fc6ffd-prvl6       1/1     Running            0          59m   10.244.1.26      t102
rook-ceph-operator-5c75765cdc-mk7l5   1/1     Running            0          2h    10.244.1.14      t102
rook-ceph-osd-0-6f787df69b-lgwnq      0/1     CrashLoopBackOff   16         59m   10.244.1.29      t102
rook-ceph-osd-1-57fdfdf548-ffvmn      0/1     CrashLoopBackOff   16         59m   10.244.2.43      t101
rook-ceph-osd-prepare-t101-l24q7      0/2     Completed          1          59m   10.244.2.42      t101
rook-ceph-osd-prepare-t102-xmplf      0/2     Completed          0          59m   10.244.1.28      t102
rook-discover-9kgn4                   1/1     Running            0          2h    10.244.2.37      t101
rook-discover-fm9m9                   1/1     Running            0          2h    10.244.1.15      t102
rook-discover-prb8n                   1/1     Running            0          2h    10.244.3.188     t103

I get the logs of one of the rook-ceph-osd-0-6f787df69b-lgwnq pod are following (other pods gave me similar logs):

2019-05-11 14:21:33.690382 I | rookcmd: starting Rook v1.0.0-13.g05b0166 with arguments '/rook/rook ceph osd start -- --foreground --id 0 --osd-uuid 40591132-ce12-4506-9b5c-f776610f504c --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false'
2019-05-11 14:21:33.690480 I | rookcmd: flag values: --help=false, --log-flush-frequency=5s, --log-level=INFO, --osd-id=0, --osd-store-type=bluestore, --osd-uuid=40591132-ce12-4506-9b5c-f776610f504c
2019-05-11 14:21:33.690488 I | op-mon: parsing mon endpoints: 
2019-05-11 14:21:33.690493 W | op-mon: ignoring invalid monitor 
2019-05-11 14:21:33.690731 I | exec: Running command: stdbuf -oL ceph-volume lvm activate --no-systemd --bluestore 0 40591132-ce12-4506-9b5c-f776610f504c
2019-05-11 14:21:33.842278 I | Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.867411 I | Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.892591 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.916502 I | Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-93f1aef5-b5f4-4543-8bb6-440ef2cf57b7/osd-data-9f6d206a-a663-474f-b1b0-1eed29a3f220 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
2019-05-11 14:21:33.997057 I | Running command: /bin/ln -snf /dev/ceph-93f1aef5-b5f4-4543-8bb6-440ef2cf57b7/osd-data-9f6d206a-a663-474f-b1b0-1eed29a3f220 /var/lib/ceph/osd/ceph-0/block
2019-05-11 14:21:34.021148 I | Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
2019-05-11 14:21:34.045753 I | Running command: /bin/chown -R ceph:ceph /dev/mapper/ceph--93f1aef5--b5f4--4543--8bb6--440ef2cf57b7-osd--data--9f6d206a--a663--474f--b1b0--1eed29a3f220
2019-05-11 14:21:34.070145 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:34.094479 I | --> ceph-volume lvm activate successful for osd ID: 0
2019-05-11 14:21:34.102444 I | exec: Running command: ceph-osd --foreground --id 0 --osd-uuid 40591132-ce12-4506-9b5c-f776610f504c --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false
2019-05-11 14:21:34.158330 I | failed to fetch mon config (--no-mon-config to skip)
failed to start osd. Failed to complete '': exit status 1.

How to reproduce it (minimal and precise):

Run kubectl deletet -f my_cluster.yaml
Delete the directory /var/lib/rook from every nodes
Run kubectl apply -f my_cluster.yaml

The file: my_cluster.yaml as follows:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v14.2.1-20190430
    allowUnsupported: true
  dataDirHostPath: /var/lib/rook
  mon:
    count: 2
    allowMultiplePerNode: false
  dashboard:
    enabled: true
  network:
    hostNetwork: false
  rbdMirroring:
    workers: 0
  annotations:
  resources:
  storage: # cluster level storage configuration and selection
    useAllNodes: false
    useAllDevices: false
    deviceFilter:
    location:
    config:
    nodes:
    - name: "t102"
      devices:
      - name: "sdb"
    - name: "t101"
      devices:
      - name: "sdb"

Environment:

OS (e.g. from /etc/os-release):

cat /etc/os-releas
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):

uname -a
Linux t104 4.18.9-1.el7.elrepo.x86_64 #1 SMP Thu Sep 20 09:04:54 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): v1.0.0
Kubernetes version (use kubectl version):

kubectl version
Client Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.0-rc.1", GitCommit:"3e4aee86dfaf933f03e052859c0a1f52704d4fef", GitTreeState:"clean", BuildDate:"2018-09-18T21:08:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): On-premises (inside centos7) created via kubeadm

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 26 (1 by maintainers)

Most upvoted comments

There was a regression in c-v which was fixed in 14.2.4, Rook 1.1.1 has that version so I suppose this bug is fixed. Based on the last comment, I’m closing this, feel free to re-open if you have any more concerns.

Thanks.

leseb on Sep 26, 2019