rook: [Important] Pod can't mount volume after server suddenly shutdown
** Bug Report **
After server suddenly shutdown, kubernetes pods didn’t work
Pods error message:
MountVolume.MountDevice failed for volume "pvc-cd68303a-c4e9-49df-be77-5289a77d2a73" : rpc error: code = Internal desc = 'xfs_repair' found errors on device /dev/rbd0 but could not correct them: Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this
But rook-ceph cluster works well. Health is ok. I can’t custom map csi image with rbd inside rook-ceph-tools. It gives me error like following:
rbd: failed to set udev buffer size: (1) Operation not permitted
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (30) Read-only file system
Deviation from expected behavior:
Expected behavior:
How to reproduce it (minimal and precise):
File(s) to submit:
- Cluster CR (custom resource), typically called
cluster.yaml, if necessary - Operator’s logs, if necessary
- Crashing pod(s) logs, if necessary
To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.
Environment:
- OS (e.g. from /etc/os-release): Ubuntu 18.04
- Kernel (e.g.
uname -a): 4.15.0-88-generic - Cloud provider or hardware configuration:
- Rook version (use
rook versioninside of a Rook Pod): 1.2.3 - Storage backend version (e.g. for ceph do
ceph -v): 14.2.5 - Kubernetes version (use
kubectl version): 1.16.3 - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): rancher
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox): HEALTH_OK
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (7 by maintainers)
Commits related to this issue
- This PR updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we are updating it for the same reason more info at https://g... — committed to ceph/ceph-csi by humblec 4 years ago
- Update vendor folder with latest dependencies NOTE: This PR also updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we ... — committed to humblec/ceph-csi by humblec 4 years ago
- Update vendor folder with latest dependencies NOTE: This PR also updates the kubernetes utils packages we are using. we had hit an issue in xfs_repair as this is fixed in recent kubernetes utils we ... — committed to humblec/ceph-csi by humblec 4 years ago
The problem has been fixed in ceph-csi 2.1.0. Would be nice to have that as the default version in rook. This bug is really problematic.