rook: cephfs storageclass does not work
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: wp-pv-claim and mysql-pv-claim is Bound, but the cephfs-pvc is Pending
Expected behavior: The cephfs-pvc works
How to reproduce it (minimal and precise):
# microk8s v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
kubectl apply -f ceph/common.yaml
kubectl apply -f ceph/operator.yaml
kubectl apply -f ceph/cluster-test.yaml
kubectl apply -f ceph/toolbox.yaml
kubectl apply -f ceph/csi/rbd/storageclass-test.yaml
kubectl apply -f . # install mysql.yaml wordpress.yaml
kubectl apply -f ceph/object-test.yaml
kubectl apply -f ceph/object-user.yaml
kubectl apply -f ceph/filesystem-test.yaml
kubectl apply -f ceph/csi/cephfs/storageclass.yaml
kubectl apply -f ceph/csi/cephfs/kube-registry.yaml
File(s) to submit:
- Cluster CR (custom resource), typically called
cluster.yaml, if necessary - Operator’s logs, if necessary
- Crashing pod(s) logs, if necessary
# kubectl -n kube-system describe pvc cephfs-pvc
Name: cephfs-pvc
Namespace: kube-system
StorageClass: csi-cephfs
Status: Pending
Volume:
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"cephfs-pvc","namespace":"kube-system"},"spec":{"acc...
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: kube-registry-5b9c9854c5-psdsv
kube-registry-5b9c9854c5-r2g4l
kube-registry-5b9c9854c5-rmqms
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 5m10s (x11 over 38m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal ExternalProvisioning 72s (x162 over 41m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
Normal Provisioning 10s (x12 over 41m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f External provisioner is provisioning volume for claim "kube-system/cephfs-pvc"
# kubectl -n rook-ceph logs csi-cephfsplugin-provisioner-f64c4574b-mvb7p -c csi-provisioner
I0927 08:03:15.230702 1 connection.go:183] GRPC response: {}
I0927 08:03:15.231643 1 connection.go:184] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0927 08:03:15.231740 1 controller.go:979] Final error received, removing PVC 264abaa4-e0f7-11e9-bead-ac1f6b84bde2 from claims in progress
W0927 08:03:15.231762 1 controller.go:886] Retrying syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2", failure 11
E0927 08:03:15.231801 1 controller.go:908] error syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2": failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0927 08:03:15.231842 1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"cephfs-pvc", UID:"264abaa4-e0f7-11e9-bead-ac1f6b84bde2", APIVersion:"v1", ResourceVersion:"17010356", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
# ceph -s
cluster:
id: 9acd086e-493b-4ebd-a39f-2be2cce80080
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 62m)
mgr: a(active, since 61m)
mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
osd: 1 osds: 1 up (since 61m), 1 in (since 61m)
rgw: 1 daemon active (my.store.a)
data:
pools: 9 pools, 72 pgs
objects: 407 objects, 457 MiB
usage: 1.1 TiB used, 611 GiB / 1.7 TiB avail
pgs: 72 active+clean
io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
Environment:
- OS (e.g. from /etc/os-release): microk8s v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
- Kernel (e.g.
uname -a): Linux ubun 4.15.0-62-generic #69~16.04.1-Ubuntu SMP Fri Sep 6 02:43:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Cloud provider or hardware configuration:
- Rook version (use
rook versioninside of a Rook Pod): rook: v1.1.1 - Storage backend version (e.g. for ceph do
ceph -v): ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) - Kubernetes version (use
kubectl version):Server Version: version.Info{Major:“1”, Minor:“14”, GitVersion:“v1.14.1”, GitCommit:“b7394102d6ef778017f2ca4046abbaa23b88c290”, GitTreeState:“clean”, BuildDate:“2019-04-08T17:02:58Z”, GoVersion:“go1.12.1”, Compiler:“gc”, Platform:“linux/amd64”} - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 13
- Comments: 65 (22 by maintainers)
Solution
Thanks @Madhu-1
Please help me understand the issue, and why the above resolved the issue???
As i understood, it, we created a CephFilesystem with the following: https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/filesystem.yaml
Afterwards, we create a storageclass “rook-cephfs” which makes use the filesystem called “myfs” created above. https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/storageclass.yaml
Lastly, we create the PVC and test deployment https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/kube-registry.yaml
Why is it that we had to execute the above quoted commands to create the subvolumegroup called csi ?? Is this something we need to do every time we need to create a new filesystem? If so, I cannot find it documented here: https://rook.io/docs/rook/v1.4/ceph-filesystem.html If it is required to manually create the subvolumegroup called csi, should it not be automatically done by rook?
@ajarr The command(
ceph fs subvolumegroup create mycephfs csi) gets stuck.Having the same problem with rook 1.2.6, ceph 14.2.7, kubernetes 1.16.8 and calico cni (tried both ipip and vxlan backends) 3.13.1. Using 3 Nodes and the example production manifests (with changed names and references).
Cephfs-csi-provisioner is hanging on process:
/usr/bin/python2.7 /usr/bin/ceph fs subvolumegroup create cephfs csi -m 10.99.154.70:6789,10.102.229.119:6789,10.98.121.172:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=/tmp/csi/keys/keyfile-739574613failing after a few minutes with:
E0321 12:19:41.405149 1 controller.go:910] error syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceededand then looping forever with:
W0321 12:19:41.926998 1 controller.go:887] Retrying syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af", failure 2 E0321 12:19:41.927087 1 controller.go:910] error syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-840cbb9b-fa9a-4026-b9a3-efccc48513af already existswhile “ceph fs subvolumegroup create cephfs csi” is still hanging in the background.
Ceph OSD/Mon Pod IPs and ports are all reachable from the nodes and pods. When using hostNetwork: true in the CephCluster the CephFS PVC creation actually works, for whatever reason…
Edit: PVC creation also works with hostNetwork: false and kube-proxy in ipvs mode
Strange but true, tnx a lot!
I have had these same issues, both with hostNetwork on and hostNetwork off. Rook @ 1.1.7, Ceph at 14.2.5. Cannot complete instructions since
ceph fs subvolumegroup create cephfs csihangs.Hi guys, This seems to be a problem with the CNI Provider that I use, previously I used CNI weave net and currently I use CNI calico, a result the cephfs run perfectly.
#cheers
@satwikk :
UnboundLocalError: local variable 'ret' referenced before assignmentThis happens because you pressed Ctrl+c. Additionally,retwas not initialized indo_commandas it is supposed to beret = 0, "", ""The issue we need to check here is why was it hanging. Looking at the logs provided here and
ceph statusoutput, I see that the volumemyfswas created by default but the below operation is not able to identify it (ENOENT) and you are trying to usemyfslater, which should be the cause of this problem.Could you please explain what the below operation is and are you passing the parameters correctly?
@veezhang from the rook ceph-tool box pod can you try issuing the following command, and check the output of,
that was the last ceph fs volumes command (https://docs.ceph.com/docs/master/cephfs/fs-volumes/#fs-subvolume-groups) executed as per logs in https://github.com/rook/rook/issues/4006#issuecomment-540428989
Can you share the corresponding ceph-mgr log for that command?
If the above commands succeeds, try issuing the following commands too,
The only critical issue I’m aware that some of us hit in Ceph v14.2.4, and that will be fixed in Ceph v14.2.5 is https://tracker.ceph.com/issues/41933 But you don’t seem to hit this issue either as
fs subvolume getpathis not executed by your csi-cephfs plugin pod?