rook: Timeout expired waiting for volumes to attach or mount for pod

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior: I was running through the mysql tutorial with slight variations (mainly instead of deploying mysql deploying busybox) and my pod is stuck in the following state:

Events:
  Type     Reason            Age                        From                    Message
  ----     ------            ----                       ----                    -------
  Warning  FailedScheduling  7m1s (x6 over 7m1s)        default-scheduler       pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled         7m1s                       default-scheduler       Successfully assigned default/test-pod to <ip_address>
  Warning  FailedMount       <invalid> (x4 over 4m58s)  kubelet, <ip_address>  Unable to mount volumes for pod "test-pod_default(6e162e59-5b18-11e9-a689-001dd8b7007f)": timeout expired waiting for volumes to attach or mount for pod "default"/"test-pod". list of unmounted volumes=[pvc]. list of unattached volumes=[pvc default-token-xn2qp]

I am running kubernetes through RKE - in which I found a Rancher specific configuration for FlexVolume on the pre-reqs docs page. I made sure my kubelet had the following:

services:
  kubelet:
    extra_args:
      volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
    extra_binds:
      - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

To my understanding, the FLEX_VOLUME_DIR path does not need to be set in the operator.yml file because Rancher uses the same exact location.

I have since rebuilt and deployed the clusters (kubernetes and rook). But still see the same error. The PVC and PV’s are getting created and bound successfully:

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS      REASON   AGE
persistentvolume/pvc-6e147360-5b18-11e9-a689-001dd8b7007f   1Gi        RWO            Retain           Bound    default/claim2   rook-ceph-block            13m

NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/claim2   Bound    pvc-6e147360-5b18-11e9-a689-001dd8b7007f   1Gi        RWO            rook-ceph-block   13m

Expected behavior: Pod should have had volume mounted to the PVC.

How to reproduce it (minimal and precise): Followed the tutorials for wordpress and mysql.

kubectl create -f operator.yml
kubectl create -f cluster.yml
kubectl create -f storageclass.yml
kubectl create -f test-pod.yml

Contents of test-pod.yml:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim2
spec:
  storageClassName: rook-ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: test-pod
spec:
  containers:
  - name: test-pod
    image: gcr.io/google_containers/busybox:1.24
    command:
    - "/bin/sh"
    args:
    - "-c"
    - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
    - name: pvc
      mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
  - name: pvc
    persistentVolumeClaim:
      claimName: claim2

Environment:

OS (e.g. from /etc/os-release):

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.4 LTS
Release:        16.04
Codename:       xenial

Kernel (e.g. uname -a): Linux 4.15.0-1036 #38~16.04.1-Ubuntu SMP Fri Dec 7 03:21:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration: Azure Stack
Rook version (use rook version inside of a Rook Pod): rook: v0.9.3
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): RKE
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

  cluster:
    id:     af0da3f2-6383-4509-8a87-8a222f48bee4
    health: HEALTH_WARN
            Degraded data redundancy: 36/108 objects degraded (33.333%), 31 pgs degraded, 100 pgs undersized

  services:
    mon: 3 daemons, quorum a,c,d
    mgr: a(active)
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   1 pools, 100 pgs
    objects: 36  objects, 37 MiB
    usage:   13 GiB used, 45 GiB / 58 GiB avail
    pgs:     36/108 objects degraded (33.333%)
             69 active+undersized
             31 active+undersized+degraded

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 36 (15 by maintainers)

Most upvoted comments

ok finally got it working by downgrading kubernetes from 1.15.3 to 1.14.6.

Is someone able to explain why rook doesn’t seem to work on k8s 1.15?

lz006 on Dec 9, 2019

We are experiencing the same issue after upgrading from Kubernetes 1.14 to 1.18.

@averi Not sure that this is the case as in our case it took 8 minutes to mount 100MB empty volume.

vasicvuk on Jun 13, 2020