rook: [Important] Pod can't mount volume after server suddenly shutdown

** Bug Report ** After server suddenly shutdown, kubernetes pods didn’t work Pods error message: MountVolume.MountDevice failed for volume "pvc-cd68303a-c4e9-49df-be77-5289a77d2a73" : rpc error: code = Internal desc = 'xfs_repair' found errors on device /dev/rbd0 but could not correct them: Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this

But rook-ceph cluster works well. Health is ok. I can’t custom map csi image with rbd inside rook-ceph-tools. It gives me error like following:

rbd: failed to set udev buffer size: (1) Operation not permitted
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (30) Read-only file system

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary
Operator’s logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read Github documentation if you need help.

Environment:

OS (e.g. from /etc/os-release): Ubuntu 18.04
Kernel (e.g. uname -a): 4.15.0-88-generic
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): 1.2.3
Storage backend version (e.g. for ceph do ceph -v): 14.2.5
Kubernetes version (use kubectl version): 1.16.3
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): rancher
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 17 (7 by maintainers)

Commits related to this issue

This PR updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we are updating it for the same reason more info at https://g... — committed to ceph/ceph-csi by humblec 4 years ago
Update vendor folder with latest dependencies NOTE: This PR also updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we ... — committed to humblec/ceph-csi by humblec 4 years ago
Update vendor folder with latest dependencies NOTE: This PR also updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we ... — committed to humblec/ceph-csi by humblec 4 years ago

Most upvoted comments

The problem has been fixed in ceph-csi 2.1.0. Would be nice to have that as the default version in rook. This bug is really problematic.

fr3aker on Apr 19, 2020