rook: OSD Prepare fails due to "unparsable uuid"
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: rook-ceph-osd-prepare fails due to “unparsable uuid”, leaves behind dirty drives and ignores them on retry, CephCluster has no OSDs
Expected behavior: OSDs get prepared and CephCluster is alive and well.
How to reproduce it (minimal and precise):
single-node k3s cluster on dedi with 2 hdd dedicated to ceph, in my case this is managed by a rook-ceph-cluster helm-release
File(s) to submit:
- Cluster CR (custom resource): cluster.yaml values.yaml (values used in rook-ceph-cluster helm-release)
- Operator’s logs, if necessary: rook-ceph-operator-7499bf8579-ksltg.log
- Crashing pod(s) logs, if necessary
Environment:
- OS (e.g. from /etc/os-release):
Ubuntu 22.04 LTS - Kernel (e.g.
uname -a):Linux ... 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 04:55:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux - Cloud provider or hardware configuration: Hetzner AX-51 /w 2 NVMEs for system & 2 HDDs for Ceph
- Rook version (use
rook versioninside of a Rook Pod):v1.9.2 - Storage backend version (e.g. for ceph do
ceph -v):ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable) - Kubernetes version (use
kubectl version):Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6+k3s1", GitCommit:"418c3fa858b69b12b9cefbcff0526f666a6236b9", GitTreeState:"clean", BuildDate:"2022-04-28T22:16:18Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"} - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): k3s ?
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):HEALTH_WARN Reduced data availability: 32 pgs inactive; OSD count 0 < osd_pool_default_size 1
I’ve been given following issues that seem similar: #9646 & #8023 However both have different errors.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 31 (18 by maintainers)
I’ve just had the same issue, fixed by raising the prepareosd memory resource limit from
400Mito800Mi.and yeah,
_read_fsid unparsable_uuidis not an error on its own.rook-ceph 1.9.9
@travisn Many users have hit this problem. So It seems to be better to add ‘resource’ field to prepare pods in CephCluster CR examples by default. Does it make sense?
Shared it to your email address on your profile. Thanks
No, but I will contribute to https://tracker.ceph.com/issues/54019 which is apparently tracking this.
I confirmed that v1.8.9 works normally. I tested v1.9.0, 1.9.1, 1.9.2, and 1.9.3 and all produce the error above when preparing the OSD.
“_read_fsid unparsable uuid” itself is not a bug. It is also shown when osd preparation succeeds.
The messages indicating each problem are in the previous lines.
However, I couldn’t reproduce this problem yet. My condition is as follows.
devices.name: "/dev/disk/by-id/XXXXXX"anddevicePathFilter: "^/dev/disk/by-partlabel/OSD[0-9]+"In all combinations, OSD started without any problems.
@alyti Did you remember the contents of the target device before creating the rook/ceph cluster?
@logan2211 Could you show me the operator log which indicates why OSD creation failed?
@travisn There seems to be multiple problems in OSD creation. At least it isn’t only come from #10212 because this change is not in v1.9.2 but @alyti 's problem and the problem in #10160 happened in 1.9.2 or older. I’m not sure if the root cause is in rook or ceph. Anyway, I’ll handle both this issue and #10160.
@logan2211 Thanks for the data point that this is a regression. This seems to be related to #10212, not sure there is a workaround other than sticking with v1.9.2. @satoru-takeuchi Can you take a look? Sounds like a number of users are hitting this.
I have the same “unparsable uuid” error when deploying new OSDs on 1.9.3. I’ve been deploying Rook clusters for years with:
The particular cluster I’m experiencing this issue on currently was deployed with 1.8.9. Then, after upgrading to 1.9.3, it will no longer add new OSDs because of the unparsable uuid error. Existing OSDs work fine. It doesn’t seem feasible to completely rebuild using PVCs. I’m using k3s on bare metal @travisn.