rook: Rook ceph OSD pods in Init:CrashLoopBackOff
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: Rook ceph osd pods are in Init:CrashLoopBackOff, after provisioning one pvc.
Expected behavior: Pods should not be in Init:CrashLoopBackOff
How to reproduce it (minimal and precise): Create a rook cluster on pvc. Create a pvc using rook storage class, osd pods enter in Init:CrashLoopBackOff state. expand-bluefs container is continuously crashing with below logs.
2021-05-19T06:36:05.166+0000 7f420d02f240 -1 bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still running?)(11) Resource temporarily unavailable
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.11/rpm/el8/BUILD/ceph-15.2.11/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f420d02f240 time 2021-05-19T06:36:05.167168+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.11/rpm/el8/BUILD/ceph-15.2.11/src/os/bluestore/BlueStore.cc: 7165: FAILED ceph_assert(r == 0)
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f420313696a]
2: (()+0x27ab84) [0x7f4203136b84]
3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x563b90ebd3e1]
4: (main()+0x2c81) [0x563b90db4001]
5: (__libc_start_main()+0xf3) [0x7f42009617b3]
6: (_start()+0x2e) [0x563b90dd4cfe]
*** Caught signal (Aborted) **
in thread 7f420d02f240 thread_name:ceph-bluestore-
2021-05-19T06:36:05.167+0000 7f420d02f240 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.11/rpm/el8/BUILD/ceph-15.2.11/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f420d02f240 time 2021-05-19T06:36:05.167168+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.11/rpm/el8/BUILD/ceph-15.2.11/src/os/bluestore/BlueStore.cc: 7165: FAILED ceph_assert(r == 0)
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f420313696a]
2: (()+0x27ab84) [0x7f4203136b84]
3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x563b90ebd3e1]
4: (main()+0x2c81) [0x563b90db4001]
5: (__libc_start_main()+0xf3) [0x7f42009617b3]
6: (_start()+0x2e) [0x563b90dd4cfe]
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
1: (()+0x12b20) [0x7f4202330b20]
2: (gsignal()+0x10f) [0x7f42009757ff]
3: (abort()+0x127) [0x7f420095fc35]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f42031369bb]
5: (()+0x27ab84) [0x7f4203136b84]
6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x563b90ebd3e1]
7: (main()+0x2c81) [0x563b90db4001]
8: (__libc_start_main()+0xf3) [0x7f42009617b3]
9: (_start()+0x2e) [0x563b90dd4cfe]
2021-05-19T06:36:05.169+0000 7f420d02f240 -1 *** Caught signal (Aborted) **
in thread 7f420d02f240 thread_name:ceph-bluestore-
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
1: (()+0x12b20) [0x7f4202330b20]
2: (gsignal()+0x10f) [0x7f42009757ff]
3: (abort()+0x127) [0x7f420095fc35]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f42031369bb]
5: (()+0x27ab84) [0x7f4203136b84]
6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x563b90ebd3e1]
7: (main()+0x2c81) [0x563b90db4001]
8: (__libc_start_main()+0xf3) [0x7f42009617b3]
9: (_start()+0x2e) [0x563b90dd4cfe]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
File(s) to submit:
cluster.yaml
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: ceph/ceph:v15.2.11
cleanupPolicy:
sanitizeDisks: {}
crashCollector: {}
dashboard:
enabled: true
ssl: true
dataDirHostPath: /var/lib/rook
disruptionManagement:
machineDisruptionBudgetNamespace: openshift-machine-api
managePodBudgets: true
osdMaintenanceTimeout: 30
external: {}
healthCheck:
daemonHealth:
mon: {}
osd: {}
status: {}
logCollector: {}
mgr:
count: 1
modules:
- enabled: true
name: pg_autoscaler
mon:
count: 3
monitoring: {}
network:
ipFamily: IPv4
placement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- storage-node
tolerations:
- effect: NoSchedule
operator: Exists
security:
kms: {}
storage:
storageClassDeviceSets:
- count: 3
name: set0
placement: {}
resources: {}
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: rook-local-storage
volumeMode: Block
status: {}
Environment:
- OS (e.g. from /etc/os-release):
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
-
Kernel (e.g.
uname -a):Linux devops013-mst-01 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux -
Cloud provider or hardware configuration:
-
Rook version (use
rook versioninside of a Rook Pod):1.6.2 -
Storage backend version (e.g. for ceph do
ceph -v):15.2.11 -
Kubernetes version (use
kubectl version):v1.20.4 -
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
kubeadm -
Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):
cluster:
id: 83eb664c-8089-43fe-89ad-ac3a0f31af69
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
clock skew detected on mon.b, mon.c
services:
mon: 3 daemons, quorum a,b,c (age 45m)
mgr: a(active, since 44m)
osd: 3 osds: 3 up (since 45m), 3 in (since 45m)
data:
pools: 2 pools, 33 pgs
objects: 124 objects, 380 MiB
usage: 4.1 GiB used, 380 GiB / 384 GiB avail
pgs: 33 active+clean
io:
client: 53 KiB/s wr, 0 op/s rd, 0 op/s wr
[root@rook-ceph-tools-78cdfd976c-hlcrj /]# ceph health
HEALTH_WARN mons are allowing insecure global_id reclaim; clock skew detected on mon.b, mon.c
[root@rook-ceph-tools-78cdfd976c-hlcrj /]#
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (6 by maintainers)
I have three servers having same disk
/dev/vdd, there is one PV per host.@satoru-takeuchi No probs , here you go