rook: [Important] Pod can't mount volume after server suddenly shutdown

** Bug Report ** After server suddenly shutdown, kubernetes pods didn’t work Pods error message: MountVolume.MountDevice failed for volume "pvc-cd68303a-c4e9-49df-be77-5289a77d2a73" : rpc error: code = Internal desc = 'xfs_repair' found errors on device /dev/rbd0 but could not correct them: Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this

But rook-ceph cluster works well. Health is ok. I can’t custom map csi image with rbd inside rook-ceph-tools. It gives me error like following:

rbd: failed to set udev buffer size: (1) Operation not permitted
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (30) Read-only file system

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator’s logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read Github documentation if you need help.

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 18.04
  • Kernel (e.g. uname -a): 4.15.0-88-generic
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): 1.2.3
  • Storage backend version (e.g. for ceph do ceph -v): 14.2.5
  • Kubernetes version (use kubectl version): 1.16.3
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): rancher
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (7 by maintainers)

Commits related to this issue

Most upvoted comments

The problem has been fixed in ceph-csi 2.1.0. Would be nice to have that as the default version in rook. This bug is really problematic.