rook: rook-ceph-osd-prepare fails with bluestore(/var/lib/ceph/osd/ceph-1/) _read_fsid unparsable uuid
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: OSD creation fails with
stderr: 2022-04-25T18:12:29.836+0000 7f023a2a43c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/) _read_fsid unparsable uuid
stderr: 2022-04-25T18:12:30.024+0000 7f023a2a43c0 -1 bluefs _replay 0x0: stop: uuid 00000000-0000-0000-0000-000000000000 != super.uuid 68ba3e0d-e42c-46b0-a51b-e8e6ab918ad6, block dump:
stderr: 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
stderr: *
stderr: 00000ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
stderr: 00001000
stderr: 2022-04-25T18:12:30.664+0000 7f023a2a43c0 -1 rocksdb: verify_sharding unable to list column families: NotFound:
stderr: 2022-04-25T18:12:30.664+0000 7f023a2a43c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/) _open_db erroring opening db:
stderr: 2022-04-25T18:12:31.224+0000 7f023a2a43c0 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
stderr: 2022-04-25T18:12:31.224+0000 7f023a2a43c0 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-1/: (5) Input/output error
--> Was unable to complete a new OSD, will rollback changes
Expected behavior: OSD creation succeeds
How to reproduce it (minimal and precise):
Unclear if it’s easily reproducible outside my environment. I’ve tried wiping the disks as per cleanup guide and increasing fs.aio-max-nr to no avail.
File(s) to submit:
- Cluster CR (custom resource), typically called
cluster.yaml, if necessary
My HelmChart definition:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: rook-ceph-cluster
namespace: rook-ceph
spec:
interval: 5m
chart:
spec:
chart: rook-ceph-cluster
version: v1.9.1
sourceRef:
kind: HelmRepository
name: rook-ceph-charts
namespace: flux-system
install:
createNamespace: true
remediation:
retries: 5
upgrade:
remediation:
retries: 5
dependsOn:
- name: rook-ceph-operator
namespace: rook-ceph
values:
monitoring:
# TODO(rook-ceph): Turn on monitoring
enabled: false
createPrometheusRules: false
ingress:
dashboard:
annotations:
kubernetes.io/ingress.class: "traefik"
cert-manager.io/cluster-issuer: "letsencrypt-production"
hajimari.io/appName: "rook-ceph-dashboard"
hajimari.io/enable: "true"
hajimari.io/icon: "web"
traefik.ingress.kubernetes.io/router.entrypoints: "websecure"
host:
name: &host "rook.${SECRET_DOMAIN}"
path: "/"
tls:
- secretName: tls.rook-ceph
hosts:
- *host
configOverride: |
[global]
bdev_enable_discard = true
bdev_async_discard = true
cephClusterSpec:
cephVersion:
image: quay.io/ceph/ceph:v17.2.0
crashCollector:
disable: false
dashboard:
enabled: true
urlPrefix: /
storage:
useAllNodes: false
useAllDevices: false
config:
osdsPerDevice: "1"
nodes:
- name: k8s-control01
devices:
- name: "nvme1n1"
- name: k8s-control02
devices:
- name: "nvme0n1"
- name: k8s-control03
devices:
- name: "nvme1n1"
cephBlockPools:
- name: ceph-blockpool
spec:
failureDomain: host
replicated:
size: 3
storageClass:
enabled: true
name: ceph-block
isDefault: true
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
- Operator’s logs, if necessary
Operator Logs and the pod logs are in this gist due to size and whatnot: https://gist.github.com/bmwinstead/af8b6b963d12702ed8d90e849235c7d1
- Crashing pod(s) logs, if necessary
To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read GitHub documentation if you need help.
Environment:
- OS (e.g. from /etc/os-release): talos
- Kernel (e.g.
uname -a): 5.15.32-talos - Cloud provider or hardware configuration:
Hardware: Simple whitebox server with AMD Ryzen 5000 something, ECC ram, and the OSD is failing to run on a combination of different NVMe SSDs: INTEL SSDPEKNW020T8 and Sabrent Rocket 4 2TB
- Rook version (use
rook versioninside of a Rook Pod):
[rook@rook-ceph-tools-d6d7c985c-dxnhx /]$ rook version
rook: v1.9.0-alpha.0.128.g4a8cb1a0a
go: go1.17.8
- Storage backend version (e.g. for ceph do
ceph -v):
[rook@rook-ceph-tools-d6d7c985c-dxnhx /]$ ceph -v
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
- Kubernetes version (use
kubectl version): 1.23.5 - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Talos
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):
[rook@rook-ceph-tools-d6d7c985c-dxnhx /]$ ceph health
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 52 pgs inactive; OSD count 0 < osd_pool_default_size 3
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 28 (10 by maintainers)
Commits related to this issue
- 尝试调整prepareosd的内存限制,并关闭ceph的bluefs_buffered_io,看看能不能正常添加osd https://github.com/rook/rook/issues/10160 — committed to locoz666/k3s-cluster by locoz666 2 years ago
Just a note: I ran into this issue when hitting resource limits. Raising them allowed the osds to be created in two clusters.
Thank you for the tip. I did just had the same issue with large SSD (7TiB) and increasing memory limit solved it.
Can confirm…
400Milimit appears to be too small. (My disks are of 7 TiB each, but that’s probably not related to that).So changing the
prepareosdresource limit fixes that.Fix:
With the default
400Miit gets OOM-killed:Default is
400Mi:I’d be careful to jump to conclusions on causes for a randomly appearing issue. I only observe the issue every 20th or even 50th or so time when running mkfs on an OSD. It seems, also according to https://tracker.ceph.com/issues/54019 that switching
bluefs_buffered_iotofalseavoids the issue.I observed similar issues at random when deploying Ceph 16.2.7 via Ansible inside a Gitlab CI pipeline to a VM:
ceph-osd -i 4 --mkfs --osd-objectstore=bluestore --osd-uuid de84f0aa-601a-473a-ad0b-7a2d403d0588 --monmap /tmp/monmapreturn code is 250 and STDERR:
a retry with the exact same command works.