rook: OSD prepare job fails with KeyError: 'KNAME'
I removed a broken OSD following the guide. Then I deleted the corresponding block-mode PV, PVC, and OSD Deployment. Then I closed the OSD’s LUKS volume via cryptsetup close and wiped the disk via wipefs -a. Afterwards I recreated the PV and expected Rook to automatically recreate the OSD. However, the OSD prepare job fails and its log contains the following backtrace:
[2022-11-13 18:27:04,778][ceph_volume.devices.raw.prepare][ERROR ] raw prepare was unable to complete
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare
self.prepare()
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 134, in prepare
tmpfs,
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 51, in prepare_bluestore
block = prepare_dmcrypt(key, block, 'block', fsid)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 23, in prepare_dmcrypt
kname = disk.lsblk(device)['KNAME']
KeyError: 'KNAME'
[2022-11-13 18:27:04,780][ceph_volume.devices.raw.prepare][INFO ] will rollback OSD ID creation
[2022-11-13 18:27:04,781][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.4 --yes-i-really-mean-it
[2022-11-13 18:27:05,553][ceph_volume.process][INFO ] stderr purged osd.4
[2022-11-13 18:27:05,571][ceph_volume.process][INFO ] Running command: /usr/bin/systemctl is-active ceph-osd@4
[2022-11-13 18:27:05,584][ceph_volume.process][INFO ] stderr System has not been booted with systemd as init system (PID 1). Can't operate.
[2022-11-13 18:27:05,585][ceph_volume.process][INFO ] stderr Failed to connect to bus: Host is down
[2022-11-13 18:27:05,589][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2022-11-13 18:27:05,590][ceph_volume.process][INFO ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=4} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2022-11-13 18:27:05,969][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare
self.prepare()
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 134, in prepare
tmpfs,
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 51, in prepare_bluestore
block = prepare_dmcrypt(key, block, 'block', fsid)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 23, in prepare_dmcrypt
kname = disk.lsblk(device)['KNAME']
KeyError: 'KNAME'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main
self.safe_prepare(self.args)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 95, in safe_prepare
rollback_osd(self.args, self.osd_id)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
Zap(['--destroy', '--osd-id', osd_id]).main()
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 404, in main
self.zap_osd()
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
'%s' % osd_id or osd_fsid)
RuntimeError: Unable to find any LV for zapping OSD: 4
The full log is attached in a GitHub Gist below.
Is this a bug report or feature request? Bug Report
Deviation from expected behavior: OSD preparation is unsuccessful.
File(s) to submit: Cluster CR: https://gist.github.com/haslersn/57251739d58ee88dd643237cc847e16e#file-cluster-yaml
Logs to submit: Crashing rook-ceph-osd-prepare pod logs: https://gist.github.com/haslersn/57251739d58ee88dd643237cc847e16e#file-rook-ceph-osd-prepare-ssd-sata-0-data-8mqrmn
Cluster Status to submit:
$ ceph status
cluster:
id: e6e99116-5ed6-4b09-b6cd-47b989beb3dd
health: HEALTH_WARN
342 daemons have recently crashed
services:
mon: 3 daemons, quorum a,e,f (age 49m)
mgr: a(active, since 4h)
mds: 1/1 daemons up, 1 hot standby
osd: 19 osds: 19 up (since 49m), 19 in (since 49m)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 14.12k objects, 2.7 GiB
usage: 8.1 GiB used, 25 TiB / 25 TiB avail
pgs: 97 active+clean
io:
client: 2.7 KiB/s rd, 1.2 KiB/s wr, 2 op/s rd, 0 op/s wr
(I think the 342 daemons have recently crashed warning is related to the former OSD which I purged from the cluster. It was in a state where the OSD Pod repeatedly crashed immediately after startup, until limited by CrashLoopBackOff. The other 19 OSDs are not affected.)
Environment:
- OS: Debian GNU/Linux 11 (bullseye)
- Kernel: 5.10.0-15-amd64, Debian 5.10.120-1 (2022-06-09), x86_64 GNU/Linux
- Cloud provider or hardware configuration: bare metal
- Rook version: v1.10.5
- Storage backend version: 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
- Kubernetes version: v1.21.7
- Kubernetes cluster type: kubeadm
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (11 by maintainers)
Commits related to this issue
- fix(rook): encrypted OSD prepare fail on >v17.2.3 https://github.com/rook/rook/issues/11304#issuecomment-1321286046 — committed to JJGadgets/Biohazard by JJGadgets a year ago
- fix(rook): encrypted OSD prepare fail on >v17.2.3 https://github.com/rook/rook/issues/11304#issuecomment-1321286046 — committed to JJGadgets/Biohazard by JJGadgets a year ago
- fix(rook): encrypted OSD prepare fail on >v17.2.3 https://github.com/rook/rook/issues/11304#issuecomment-1321286046 — committed to JJGadgets/Biohazard by JJGadgets a year ago
- fix(rook): encrypted OSD prepare fail on >v17.2.3 https://github.com/rook/rook/issues/11304#issuecomment-1321286046 — committed to JJGadgets/Biohazard by JJGadgets a year ago
fix under review https://github.com/ceph/ceph/pull/49171
I think I found the underlying issue:
In Ceph v17.2.3 the following worked:
But in Ceph v17.2.4 upwards, I get an empty result:
But it does work when specifying the udev-created block device file:
yes, but what I’m saying is that it’s merged in ‘main’ but not in quincy/pacific
sure