rook: ceph volume command hang during add osds to cluster

Is this a bug report or feature request?

  • Bug Report

Hi, I was trying to add some osds to my ceph cluster, the osd-prepare job stuck in Running state, while scanning the job’s log, I found that the job stuck while executing ceph volume command. We used lvconfig command to find udev_sync and udev_rules’s value was 1, I post the content of lvm.conf on my device below.

2023-10-20 11:27:49.613068 D | cephosd: &{Name:sdd Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-SATA_QEMU_HARDDISK_QM00007 /dev/disk/by-path/pci-0000:00:1f.2-ata-4 /dev/disk/by-path/pci-0000:00:1f.2-ata-4.0 /dev/disk/by-uuid/2023-10-20-09-59-11-00 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00007 /dev/disk/by-id/scsi-0ATA_QEMU_HARDDISK_QM00007 /dev/disk/by-label/config-2 /dev/disk/by-diskseq/4 /dev/disk/by-id/scsi-1ATA_QEMU_HARDDISK_QM00007 Size:1048576 UUID:8a8532c9-2fdb-43ed-a5c4-8331065ed8d1 Serial:QEMU_HARDDISK_QM00007 Type:disk Rotational:true Readonly:false Partitions:[] Filesystem:iso9660 Mountpoint: Vendor:ATA Model:QEMU_HARDDISK WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sdd KernelName:sdd Encrypted:false}
2023-10-20 11:27:49.613080 I | cephosd: skipping device "sda1" with mountpoint "boot"
2023-10-20 11:27:49.613085 I | cephosd: skipping device "sda2" with mountpoint "rootfs"
2023-10-20 11:27:49.613089 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2023-10-20 11:27:49.614402 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-10-20 11:27:49.620276 D | sys: lsblk output: "SIZE=\"214748364800\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-10-20 11:27:49.620308 D | exec: Running command: ceph-volume inventory --format json /dev/sdb

I found similar issue here, but it is helpless

        # Configuration option activation/udev_sync.
        # Use udev notifications to synchronize udev and LVM.
        # The --noudevsync option overrides this setting.
        # When disabled, LVM commands will not wait for notifications from
        # udev, but continue irrespective of any possible udev processing in
        # the background. Only use this if udev is not running or has rules
        # that ignore the devices LVM creates. If enabled when udev is not
        # running, and LVM processes are waiting for udev, run the command
        # 'dmsetup udevcomplete_all' to wake them up.
        # This configuration option has an automatic default value.
        # udev_sync = 1

        # Configuration option activation/udev_rules.
        # Use udev rules to manage LV device nodes and symlinks.
        # When disabled, LVM will manage the device nodes and symlinks for
        # active LVs itself. Manual intervention may be required if this
        # setting is changed while LVs are active.
        # This configuration option has an automatic default value.
        # udev_rules = 1

From above we can see that the default value of udev_rules and udev_sync were 1, after searching the Rook’s source code, we found these codes like below:

func UpdateLVMConfig(context *clusterd.Context, onPVC, lvBackedPV bool) error {

	input, err := os.ReadFile(lvmConfPath)
	if err != nil {
		return errors.Wrapf(err, "failed to read lvm config file %q", lvmConfPath)
	}

	output := bytes.Replace(input, []byte("udev_sync = 1"), []byte("udev_sync = 0"), 1)
	output = bytes.Replace(output, []byte("allow_changes_with_duplicate_pvs = 0"), []byte("allow_changes_with_duplicate_pvs = 1"), 1)
	output = bytes.Replace(output, []byte("udev_rules = 1"), []byte("udev_rules = 0"), 1)
	output = bytes.Replace(output, []byte("use_lvmetad = 1"), []byte("use_lvmetad = 0"), 1)
	output = bytes.Replace(output, []byte("obtain_device_list_from_udev = 1"), []byte("obtain_device_list_from_udev = 0"), 1)

Obviously these settings didn’t work in my case, have you faced situations like this? and can you push a PR to fix this on the version(11.4) we are using? How to reproduce it (minimal and precise):

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:

  • Operator’s logs, if necessary

  • Crashing pod(s) logs, if necessary

    To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read GitHub documentation if you need help.

Cluster Status to submit:

  • Output of kubectl commands, if necessary

    To get the health of the cluster, use kubectl rook-ceph health To get the status of the cluster, use kubectl rook-ceph ceph status For more details, see the Rook kubectl Plugin

Environment:

  • OS (e.g. from /etc/os-release): Rhel 9.2
  • Kernel (e.g. uname -a):5.14.0-284.11.1
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): 11.4
  • Storage backend version (e.g. for ceph do ceph -v):17.2.6
  • Kubernetes version (use kubectl version):1.22.6
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):RKE2
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 27 (13 by maintainers)

Most upvoted comments

It means all commands exited without hanged up?

yes

You mean this promblem doesn’t happen in these OSes? This is important information for investigation. IIRC, Rook uses some commands exists in host OS. This problem might depend on the versions of such tools.

Yes,I’ve tested rook on Centos7.9, Ubuntu22.04 and Redhat7.9 with same images and they were working well.

@CrossainQi I couldn’t reproduce your problem. I’m investigating the log of your OSD prepare pod in detail.

I have some questions.

  1. You use custom Ceph image (ecr-sh.yun.test.cn/alkaid/ceph/ceph:v17.2.6). What is the difference between vanilla v17.2.6 image and this one?
  2. Could you provide the result of blkid again? Probably something wrong happened in your copy-and-paste.
# blkid
^@^@^@^@^@^@^@
  1. There are some Ceph block devices (rbdX) in the node. Are they from this cluster? Or other Ceph clusters?
  1. The Ceph image we are using is the same as vanilla v17.2.6 image
  2. The blkid command got stuck while executing.
  3. rbdX devices were created by other Ceph clusters, I rebuild the instance before adding the node to the new cluster.

Not yet. I’m tring to reropuce this problem as a first step.