kubernetes: [GKE] Pod Can't Mount Volumes (Fails to Create Paths) Won't Leave ContainerCreating State

We have a disk “databases-us-central1-b-kube” that contains some protein databases (blast non-redundant, pdb, hhsearch, etc.) It’s supposed to be mounted read-only.

We had a replication controller/replica in our “development” namespace (which was created long ago) that was accidentally set to mount this drive as read-write. It was previously the only thing using the drive, so we had no issues then and didn’t notice. However, we recently tried to add a second controller/replica that tries to mount the same drive as read-only. When we noticed our mistake in the first controller, we changed it’s configuration to expose the drive as read-only, deleted the replication controller with kubectl delete rc/theservice and re-created it with kubectl create -f theservice-controller.yml.

Now we’re having this awesome problem where almost nothing can mount disks. I had the bright idea to restart each node in the cluster via the google cloud console in an attempt to fix things, and now things that mount a disks are failing to start.

There is come collateral here from services that depend on things that are failing, but I think this illustrates the problem fairly clearly. Everything listed here in Error state was running fine with disks happily attached until I rebooted the nodes, the original controller was deleted before I rebooted the nodes;

$ kubectl get pods --all-namespaces # post cluster recycle, reboot, etc.

NAMESPACE     NAME                                                                       READY     STATUS             RESTARTS   AGE
daemons       athingath-soka2                                                            1/1       Running            1          1d
daemons       blahblah-u8mjb                                                             1/1       Running            1          1d
daemons       blahblah-wshsd                                                             1/1       Running            1          1d
development   cthin-298yb                                                                1/1       Running            1          1d
development   dth-n5ce0                                                                  4/4       Running            4          21h
development   eth-zttp5                                                                  1/1       Running            1          1d
development   fthingf-uc7v1                                                              3/3       Running            3          14h
development   gthinggt-yjb3n                                                             1/1       Running            1          1d
development   hthinghti-yrtap                                                            1/1       Running            1          1d
development   ithingi-plbf1                                                              1/1       Running            1          1d
development   jthingjt-9r0s3                                                             0/2       Error              0          23h
development   kthingk-cc3sf                                                              1/1       Running            2          1d
kube-system   fluentd-cloud-logging-gke-nananananananananaanananananananananaanan-oc5l   1/1       Running            1          1d
kube-system   fluentd-cloud-logging-gke-nananananananananaanananananananananaanan-xr5o   1/1       Running            1          1d
kube-system   heapster-v1.0.2-595387591-8mwzx                                            2/2       Running            2          1d
kube-system   kube-dns-v11-o56o6                                                         4/4       Running            4          1d
kube-system   kube-proxy-gke-nananananananananaanananananananananaanan-oc5l              1/1       Running            1          1d
kube-system   kube-proxy-gke-nananananananananaanananananananananaanan-xr5o              1/1       Running            1          1d
kube-system   kubernetes-dashboard-v1.0.1-816o3                                          1/1       Running            1          1d
kube-system   l7-lb-controller-v0.6.0-tov5q                                              2/2       Running            8          1d
production    cthin-4rux8                                                                2/2       Running            2          1d
production    dth-hjg5f                                                                  4/4       Running            9          1d
production    eth-1buwt                                                                  2/2       Running            2          1d
production    fthingf-fyr51                                                              3/3       Running            9          1d
production    gthinggt-mk8ok                                                             2/2       Running            2          1d
production    hthinghti-94r7k                                                            2/2       Running            2          1d
production    ithingi-lkrzj                                                              0/1       Error              0          1d
production    jthingjt-wlnpt                                                             0/3       Error              1          1d
production    kthingk-qfuyg                                                              2/2       Running            2          1d
staging       cthin-8co0y                                                                1/1       Running            1          1d
staging       dth-2jppb                                                                  3/4       CrashLoopBackOff   16         1d
staging       eth-qfh2z                                                                  1/1       Running            1          1d
staging       fthingf-a5x7l                                                              3/3       Running            4          1d
staging       gthinggt-bdpui                                                             1/1       Running            1          1d
staging       hthinghti-6vxcz                                                            1/1       Running            1          1d
staging       ithingi-mi93m                                                              0/1       Error              0          1d
staging       jthingjt-aloy7                                                             0/2       Error              2          1d
staging       kthingk-3l3ja                                                              1/1       Running            1          1d

Here are some debugging outputs;

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.2", GitCommit:"528f879e7d3790ea4287687ef0ab3f2a01cc2718", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}

$ kubectl get events # pre cluster recycle & reboot

FIRSTSEEN   LASTSEEN   COUNT     NAME            KIND                    SUBOBJECT                   TYPE      REASON             SOURCE                                                 MESSAGE
10h         27s        609       thing-apz0y     Pod                                                 Warning   FailedMount        {kubelet gke-redacted-node-rq7o}   Unable to mount volumes for pod "thing-apz0y_development(redacted-uuid)": Could not attach GCE PD "databases-us-central1-b-kube". Timeout waiting for mount paths to be created.
10h         27s        609       thing-apz0y     Pod                                                 Warning   FailedSync         {kubelet gke-redacted-node-rq7o}   Error syncing pod, skipping: Could not attach GCE PD "databases-us-central1-b-kube". Timeout waiting for mount paths to be created.

$ kubectl get pods

NAME              READY     STATUS              RESTARTS   AGE
...
thing-apz0y       0/2       ContainerCreating   0          10h
...

Here’s an rc config from the horse’s mouth;

$ kubectl edit rc/athing

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: ReplicationController
metadata:
  creationTimestamp: 2016-05-18T...
  generation: 1
  labels:
    name: thing
  name: thing
  namespace: development
  resourceVersion: "redacted"
  selfLink: /api/v1/namespaces/development/replicationcontrollers/thing
  uid: redacted-uid
spec:
  replicas: 1
  selector:
    name: thing
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: thing
        track: build
    spec:
      containers:
      - command:
        - fab
        - uwsgi
        env:
        ...
        image: gcr.io/cyrusmolcloud/thing:build
        imagePullPolicy: Always
        name: thing
        ports:
        - containerPort: 80
          name: thing-web
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        volumeMounts:
        - mountPath: /var/databases
          name: databases
      - command:
        - fab
        - runworkers
        env:
        ...
        image: gcr.io/cyrusmolcloud/thing:build
        imagePullPolicy: Always
        name: worker
        resources: {}
        terminationMessagePath: /dev/termination-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - gcePersistentDisk:
          fsType: ext4
          pdName: databases-us-central-1-kube
          readOnly: true
        name: databases
status:
  fullyLabeledReplicas: 1
  observedGeneration: 1
  replicas: 1

So, the great thing is that some/all drives are actually mounted to the nodes;

$ gcloud compute instances describe gke-redacted-node-rq7o

canIpForward: true
cpuPlatform: Intel Haswell
creationTimestamp: '2016-05-16T...'
disks:
- autoDelete: true
  boot: true
  deviceName: persistent-disk-0
  index: 0
  interface: SCSI
  kind: compute#attachedDisk
  licenses:
  - https://www.googleapis.com/compute/v1/projects/gke-node-images/global/licenses/gke-node
  mode: READ_WRITE
  source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/gke-redacted-node-rq7o
  type: PERSISTENT
- autoDelete: false
  boot: false
  deviceName: databases-us-central1-b-kube
  index: 1
  interface: SCSI
  kind: compute#attachedDisk
  mode: READ_ONLY
  source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/databases-us-central1-b-kube
  type: PERSISTENT
- autoDelete: false
  boot: false
  deviceName: databases-us-central-1-kube-1
  index: 2
  interface: SCSI
  kind: compute#attachedDisk
  mode: READ_ONLY
  source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/databases-us-central-1-kube-1
  type: PERSISTENT
id: 'redacted'
kind: compute#instance
machineType: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/machineTypes/n1-standard-16
metadata:
...

I can’t even see any clear patterns in the failures.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 15 (5 by maintainers)

Most upvoted comments

I’ve upgraded to 1.3, but I’m still having the issue where ROX volume claims are not mounted properly.

$ kubectl get pv

NAME        CAPACITY   ACCESSMODES   STATUS    CLAIM                   REASON    AGE
databases   500Gi      ROX           Bound     a-namespace/databases             7m

$ kubectl get pvc --namespace=a-namespace

NAME        STATUS    VOLUME      CAPACITY   ACCESSMODES   AGE
databases   Bound     databases   0                        7m

$ kubectl get events

...
...timeout expired waiting for volumes to attach/mount for pod "mypod-id"/"a-namespace". list of unattached/unmounted volumes=[databases]