rook: cephfs storageclass does not work

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior: wp-pv-claim and mysql-pv-claim is Bound, but the cephfs-pvc is Pending

Expected behavior: The cephfs-pvc works

How to reproduce it (minimal and precise):

# microk8s  v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
kubectl apply -f ceph/common.yaml
kubectl apply -f ceph/operator.yaml
kubectl apply -f ceph/cluster-test.yaml
kubectl apply -f ceph/toolbox.yaml
kubectl apply -f ceph/csi/rbd/storageclass-test.yaml
kubectl apply -f . # install mysql.yaml wordpress.yaml
kubectl apply -f ceph/object-test.yaml
kubectl apply -f ceph/object-user.yaml
kubectl apply -f ceph/filesystem-test.yaml
kubectl apply -f ceph/csi/cephfs/storageclass.yaml
kubectl apply -f ceph/csi/cephfs/kube-registry.yaml

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary
Operator’s logs, if necessary
Crashing pod(s) logs, if necessary

# kubectl -n kube-system describe pvc cephfs-pvc
Name:          cephfs-pvc
Namespace:     kube-system
StorageClass:  csi-cephfs
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"cephfs-pvc","namespace":"kube-system"},"spec":{"acc...
               volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    kube-registry-5b9c9854c5-psdsv
               kube-registry-5b9c9854c5-r2g4l
               kube-registry-5b9c9854c5-rmqms
Events:
  Type     Reason                Age                   From                                                                                                             Message
  ----     ------                ----                  ----                                                                                                             -------
  Warning  ProvisioningFailed    5m10s (x11 over 38m)  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f  failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  72s (x162 over 41m)   persistentvolume-controller                                                                                      waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          10s (x12 over 41m)    rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f  External provisioner is provisioning volume for claim "kube-system/cephfs-pvc"


# kubectl -n rook-ceph logs csi-cephfsplugin-provisioner-f64c4574b-mvb7p -c csi-provisioner
I0927 08:03:15.230702       1 connection.go:183] GRPC response: {}
I0927 08:03:15.231643       1 connection.go:184] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0927 08:03:15.231740       1 controller.go:979] Final error received, removing PVC 264abaa4-e0f7-11e9-bead-ac1f6b84bde2 from claims in progress
W0927 08:03:15.231762       1 controller.go:886] Retrying syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2", failure 11
E0927 08:03:15.231801       1 controller.go:908] error syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2": failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0927 08:03:15.231842       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"cephfs-pvc", UID:"264abaa4-e0f7-11e9-bead-ac1f6b84bde2", APIVersion:"v1", ResourceVersion:"17010356", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded

# ceph -s    
  cluster:
    id:     9acd086e-493b-4ebd-a39f-2be2cce80080
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum a (age 62m)
    mgr: a(active, since 61m)
    mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
    osd: 1 osds: 1 up (since 61m), 1 in (since 61m)
    rgw: 1 daemon active (my.store.a)
 
  data:
    pools:   9 pools, 72 pgs
    objects: 407 objects, 457 MiB
    usage:   1.1 TiB used, 611 GiB / 1.7 TiB avail
    pgs:     72 active+clean
 
  io:
    client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

Environment:

OS (e.g. from /etc/os-release): microk8s v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
Kernel (e.g. uname -a): Linux ubun 4.15.0-62-generic #69~16.04.1-Ubuntu SMP Fri Sep 6 02:43:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): rook: v1.1.1
Storage backend version (e.g. for ceph do ceph -v): ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
Kubernetes version (use kubectl version):Server Version: version.Info{Major:“1”, Minor:“14”, GitVersion:“v1.14.1”, GitCommit:“b7394102d6ef778017f2ca4046abbaa23b88c290”, GitTreeState:“clean”, BuildDate:“2019-04-08T17:02:58Z”, GoVersion:“go1.12.1”, Compiler:“gc”, Platform:“linux/amd64”}
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 13
Comments: 65 (22 by maintainers)

Most upvoted comments

Solution

you need to create the filesystem (Rook operator is not supposed to create that )

kubectl apply -f https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/filesystem.yaml

Create storage class
Create pvc,pv,po or whatever

Thanks @Madhu-1

+11

ksingh7 on Mar 3, 2020

ceph fs subvolumegroup create myfs csi ceph fs subvolume create myfs subvol00 --group_name csi ceph fs subvolume getpath myfs subvol00 --group_name csi ceph fs subvolume rm myfs subvol00 --group_name csi

Please help me understand the issue, and why the above resolved the issue???

As i understood, it, we created a CephFilesystem with the following: https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/filesystem.yaml

Afterwards, we create a storageclass “rook-cephfs” which makes use the filesystem called “myfs” created above. https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/storageclass.yaml

Lastly, we create the PVC and test deployment https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/kube-registry.yaml

Why is it that we had to execute the above quoted commands to create the subvolumegroup called csi ?? Is this something we need to do every time we need to create a new filesystem? If so, I cannot find it documented here: https://rook.io/docs/rook/v1.4/ceph-filesystem.html If it is required to manually create the subvolumegroup called csi, should it not be automatically done by rook?

psavva on Aug 19, 2020

@ajarr The command(ceph fs subvolumegroup create mycephfs csi) gets stuck.

veezhang on Oct 12, 2019

Having the same problem with rook 1.2.6, ceph 14.2.7, kubernetes 1.16.8 and calico cni (tried both ipip and vxlan backends) 3.13.1. Using 3 Nodes and the example production manifests (with changed names and references).

Cephfs-csi-provisioner is hanging on process: /usr/bin/python2.7 /usr/bin/ceph fs subvolumegroup create cephfs csi -m 10.99.154.70:6789,10.102.229.119:6789,10.98.121.172:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=/tmp/csi/keys/keyfile-739574613

failing after a few minutes with: E0321 12:19:41.405149 1 controller.go:910] error syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded

and then looping forever with: W0321 12:19:41.926998 1 controller.go:887] Retrying syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af", failure 2 E0321 12:19:41.927087 1 controller.go:910] error syncing claim "840cbb9b-fa9a-4026-b9a3-efccc48513af": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-840cbb9b-fa9a-4026-b9a3-efccc48513af already exists

while “ceph fs subvolumegroup create cephfs csi” is still hanging in the background.

Ceph OSD/Mon Pod IPs and ports are all reachable from the nodes and pods. When using hostNetwork: true in the CephCluster the CephFS PVC creation actually works, for whatever reason…

Edit: PVC creation also works with hostNetwork: false and kube-proxy in ipvs mode

tafkam on Mar 23, 2020

Hi guys, This seems to be a problem with the CNI Provider that I use, previously I used CNI weave net and currently I use CNI calico, a result the cephfs run perfectly.

#cheers

Strange but true, tnx a lot!

nimmichele on Oct 9, 2020

I have had these same issues, both with hostNetwork on and hostNetwork off. Rook @ 1.1.7, Ceph at 14.2.5. Cannot complete instructions since ceph fs subvolumegroup create cephfs csi hangs.

alextraul on Feb 24, 2020

Hi guys, This seems to be a problem with the CNI Provider that I use, previously I used CNI weave net and currently I use CNI calico, a result the cephfs run perfectly.

#cheers

fahmi-id on Nov 27, 2019

@satwikk : UnboundLocalError: local variable 'ret' referenced before assignment This happens because you pressed Ctrl+c. Additionally, ret was not initialized in do_command as it is supposed to be ret = 0, "", ""

The issue we need to check here is why was it hanging. Looking at the logs provided here and ceph status output, I see that the volume myfs was created by default but the below operation is not able to identify it (ENOENT) and you are trying to use myfs later, which should be the cause of this problem.

Could you please explain what the below operation is and are you passing the parameters correctly?

I1010 07:11:52.254715       1 utils.go:119] ID: 6768 GRPC call: /csi.v1.Controller/CreateVolume
I1010 07:11:52.254745       1 utils.go:120] ID: 6768 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-3be36efb-eb2d-11e9-bead-ac1f6b84bde2","parameters":{"clusterID":"rook-ceph","fsName":"myfs","pool":"myfs-data0"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":5}}]}
I1010 07:11:52.258214       1 util.go:48] ID: 6768 cephfs: EXEC ceph [-m 10.152.183.224:6789 --id admin --keyfile=***stripped*** -c /etc/ceph/ceph.conf fs get myfs --format=json]
E1010 07:11:52.802795       1 controllerserver.go:84] ID: 6768 validation and extraction of volume options failed: an error occurred while running (21749) ceph [-m 10.152.183.224:6789 --id admin --keyfile=***stripped*** -c /etc/ceph/ceph.conf fs get myfs --format=json]: exit status 2: Error ENOENT: filesystem 'myfs' not found

joscollin on Nov 20, 2019

@veezhang from the rook ceph-tool box pod can you try issuing the following command, and check the output of,

$ ceph fs subvolumegroup create mycephfs csi

that was the last ceph fs volumes command (https://docs.ceph.com/docs/master/cephfs/fs-volumes/#fs-subvolume-groups) executed as per logs in https://github.com/rook/rook/issues/4006#issuecomment-540428989

Can you share the corresponding ceph-mgr log for that command?

If the above commands succeeds, try issuing the following commands too,

$ ceph fs subvolume create mycephfs subvol00 --group_name csi
$ ceph fs subvolume getpath mycephfs subvol00 --group_name csi
$ ceph fs subvolume rm mycephfs subvol00 --group_name csi

The only critical issue I’m aware that some of us hit in Ceph v14.2.4, and that will be fixed in Ceph v14.2.5 is https://tracker.ceph.com/issues/41933 But you don’t seem to hit this issue either as fs subvolume getpath is not executed by your csi-cephfs plugin pod?

ajarr on Oct 11, 2019