rook: rook-ceph-osd pods in CrashLoopBackOff
Is this a bug report or feature request?
- Bug Report When I reinstall rook-ceph cluster in my k8s with this doc: https://rook.github.io/docs/rook/v1.0/ceph-quickstart.html, the rook-ceph-osd-* pods are in CrashLoopBackOff state:
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-agent-j6s9c 1/1 Running 0 2h 192.168.11.102 t102
rook-ceph-agent-jkrvb 1/1 Running 0 2h 192.168.11.101 t101
rook-ceph-agent-r6tjc 1/1 Running 0 2h 192.168.11.103 t103
rook-ceph-mgr-a-5855fc9dc6-4vcxw 1/1 Running 0 59m 10.244.1.27 t102
rook-ceph-mon-a-79896cfbb7-nnkbb 1/1 Running 0 59m 10.244.2.41 t101
rook-ceph-mon-b-767fc6ffd-prvl6 1/1 Running 0 59m 10.244.1.26 t102
rook-ceph-operator-5c75765cdc-mk7l5 1/1 Running 0 2h 10.244.1.14 t102
rook-ceph-osd-0-6f787df69b-lgwnq 0/1 CrashLoopBackOff 16 59m 10.244.1.29 t102
rook-ceph-osd-1-57fdfdf548-ffvmn 0/1 CrashLoopBackOff 16 59m 10.244.2.43 t101
rook-ceph-osd-prepare-t101-l24q7 0/2 Completed 1 59m 10.244.2.42 t101
rook-ceph-osd-prepare-t102-xmplf 0/2 Completed 0 59m 10.244.1.28 t102
rook-discover-9kgn4 1/1 Running 0 2h 10.244.2.37 t101
rook-discover-fm9m9 1/1 Running 0 2h 10.244.1.15 t102
rook-discover-prb8n 1/1 Running 0 2h 10.244.3.188 t103
I get the logs of one of the rook-ceph-osd-0-6f787df69b-lgwnq pod are following (other pods gave me similar logs):
2019-05-11 14:21:33.690382 I | rookcmd: starting Rook v1.0.0-13.g05b0166 with arguments '/rook/rook ceph osd start -- --foreground --id 0 --osd-uuid 40591132-ce12-4506-9b5c-f776610f504c --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false'
2019-05-11 14:21:33.690480 I | rookcmd: flag values: --help=false, --log-flush-frequency=5s, --log-level=INFO, --osd-id=0, --osd-store-type=bluestore, --osd-uuid=40591132-ce12-4506-9b5c-f776610f504c
2019-05-11 14:21:33.690488 I | op-mon: parsing mon endpoints:
2019-05-11 14:21:33.690493 W | op-mon: ignoring invalid monitor
2019-05-11 14:21:33.690731 I | exec: Running command: stdbuf -oL ceph-volume lvm activate --no-systemd --bluestore 0 40591132-ce12-4506-9b5c-f776610f504c
2019-05-11 14:21:33.842278 I | Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.867411 I | Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.892591 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:33.916502 I | Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-93f1aef5-b5f4-4543-8bb6-440ef2cf57b7/osd-data-9f6d206a-a663-474f-b1b0-1eed29a3f220 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
2019-05-11 14:21:33.997057 I | Running command: /bin/ln -snf /dev/ceph-93f1aef5-b5f4-4543-8bb6-440ef2cf57b7/osd-data-9f6d206a-a663-474f-b1b0-1eed29a3f220 /var/lib/ceph/osd/ceph-0/block
2019-05-11 14:21:34.021148 I | Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
2019-05-11 14:21:34.045753 I | Running command: /bin/chown -R ceph:ceph /dev/mapper/ceph--93f1aef5--b5f4--4543--8bb6--440ef2cf57b7-osd--data--9f6d206a--a663--474f--b1b0--1eed29a3f220
2019-05-11 14:21:34.070145 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-11 14:21:34.094479 I | --> ceph-volume lvm activate successful for osd ID: 0
2019-05-11 14:21:34.102444 I | exec: Running command: ceph-osd --foreground --id 0 --osd-uuid 40591132-ce12-4506-9b5c-f776610f504c --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false
2019-05-11 14:21:34.158330 I | failed to fetch mon config (--no-mon-config to skip)
failed to start osd. Failed to complete '': exit status 1.
How to reproduce it (minimal and precise):
- Run
kubectl deletet -f my_cluster.yaml - Delete the directory
/var/lib/rookfrom every nodes - Run
kubectl apply -f my_cluster.yaml
The file: my_cluster.yaml as follows:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: ceph/ceph:v14.2.1-20190430
allowUnsupported: true
dataDirHostPath: /var/lib/rook
mon:
count: 2
allowMultiplePerNode: false
dashboard:
enabled: true
network:
hostNetwork: false
rbdMirroring:
workers: 0
annotations:
resources:
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: false
deviceFilter:
location:
config:
nodes:
- name: "t102"
devices:
- name: "sdb"
- name: "t101"
devices:
- name: "sdb"
Environment:
- OS (e.g. from /etc/os-release):
cat /etc/os-releas
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a):
uname -a
Linux t104 4.18.9-1.el7.elrepo.x86_64 #1 SMP Thu Sep 20 09:04:54 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
- Cloud provider or hardware configuration:
- Rook version (use
rook versioninside of a Rook Pod):v1.0.0 - Kubernetes version (use
kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.0-rc.1", GitCommit:"3e4aee86dfaf933f03e052859c0a1f52704d4fef", GitTreeState:"clean", BuildDate:"2018-09-18T21:08:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): On-premises (inside centos7) created via kubeadm
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 26 (1 by maintainers)
There was a regression in c-v which was fixed in 14.2.4, Rook 1.1.1 has that version so I suppose this bug is fixed. Based on the last comment, I’m closing this, feel free to re-open if you have any more concerns.
Thanks.