kubernetes: [GKE] Pod Can't Mount Volumes (Fails to Create Paths) Won't Leave ContainerCreating State
We have a disk “databases-us-central1-b-kube” that contains some protein databases (blast non-redundant, pdb, hhsearch, etc.) It’s supposed to be mounted read-only.
We had a replication controller/replica in our “development” namespace (which was created long ago) that was accidentally set to mount this drive as read-write. It was previously the only thing using the drive, so we had no issues then and didn’t notice. However, we recently tried to add a second controller/replica that tries to mount the same drive as read-only. When we noticed our mistake in the first controller, we changed it’s configuration to expose the drive as read-only, deleted the replication controller with kubectl delete rc/theservice and re-created it with kubectl create -f theservice-controller.yml.
Now we’re having this awesome problem where almost nothing can mount disks. I had the bright idea to restart each node in the cluster via the google cloud console in an attempt to fix things, and now things that mount a disks are failing to start.
There is come collateral here from services that depend on things that are failing, but I think this illustrates the problem fairly clearly. Everything listed here in Error state was running fine with disks happily attached until I rebooted the nodes, the original controller was deleted before I rebooted the nodes;
$ kubectl get pods --all-namespaces # post cluster recycle, reboot, etc.
NAMESPACE NAME READY STATUS RESTARTS AGE
daemons athingath-soka2 1/1 Running 1 1d
daemons blahblah-u8mjb 1/1 Running 1 1d
daemons blahblah-wshsd 1/1 Running 1 1d
development cthin-298yb 1/1 Running 1 1d
development dth-n5ce0 4/4 Running 4 21h
development eth-zttp5 1/1 Running 1 1d
development fthingf-uc7v1 3/3 Running 3 14h
development gthinggt-yjb3n 1/1 Running 1 1d
development hthinghti-yrtap 1/1 Running 1 1d
development ithingi-plbf1 1/1 Running 1 1d
development jthingjt-9r0s3 0/2 Error 0 23h
development kthingk-cc3sf 1/1 Running 2 1d
kube-system fluentd-cloud-logging-gke-nananananananananaanananananananananaanan-oc5l 1/1 Running 1 1d
kube-system fluentd-cloud-logging-gke-nananananananananaanananananananananaanan-xr5o 1/1 Running 1 1d
kube-system heapster-v1.0.2-595387591-8mwzx 2/2 Running 2 1d
kube-system kube-dns-v11-o56o6 4/4 Running 4 1d
kube-system kube-proxy-gke-nananananananananaanananananananananaanan-oc5l 1/1 Running 1 1d
kube-system kube-proxy-gke-nananananananananaanananananananananaanan-xr5o 1/1 Running 1 1d
kube-system kubernetes-dashboard-v1.0.1-816o3 1/1 Running 1 1d
kube-system l7-lb-controller-v0.6.0-tov5q 2/2 Running 8 1d
production cthin-4rux8 2/2 Running 2 1d
production dth-hjg5f 4/4 Running 9 1d
production eth-1buwt 2/2 Running 2 1d
production fthingf-fyr51 3/3 Running 9 1d
production gthinggt-mk8ok 2/2 Running 2 1d
production hthinghti-94r7k 2/2 Running 2 1d
production ithingi-lkrzj 0/1 Error 0 1d
production jthingjt-wlnpt 0/3 Error 1 1d
production kthingk-qfuyg 2/2 Running 2 1d
staging cthin-8co0y 1/1 Running 1 1d
staging dth-2jppb 3/4 CrashLoopBackOff 16 1d
staging eth-qfh2z 1/1 Running 1 1d
staging fthingf-a5x7l 3/3 Running 4 1d
staging gthinggt-bdpui 1/1 Running 1 1d
staging hthinghti-6vxcz 1/1 Running 1 1d
staging ithingi-mi93m 0/1 Error 0 1d
staging jthingjt-aloy7 0/2 Error 2 1d
staging kthingk-3l3ja 1/1 Running 1 1d
Here are some debugging outputs;
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.2", GitCommit:"528f879e7d3790ea4287687ef0ab3f2a01cc2718", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}
$ kubectl get events # pre cluster recycle & reboot
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
10h 27s 609 thing-apz0y Pod Warning FailedMount {kubelet gke-redacted-node-rq7o} Unable to mount volumes for pod "thing-apz0y_development(redacted-uuid)": Could not attach GCE PD "databases-us-central1-b-kube". Timeout waiting for mount paths to be created.
10h 27s 609 thing-apz0y Pod Warning FailedSync {kubelet gke-redacted-node-rq7o} Error syncing pod, skipping: Could not attach GCE PD "databases-us-central1-b-kube". Timeout waiting for mount paths to be created.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
...
thing-apz0y 0/2 ContainerCreating 0 10h
...
Here’s an rc config from the horse’s mouth;
$ kubectl edit rc/athing
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: ReplicationController
metadata:
creationTimestamp: 2016-05-18T...
generation: 1
labels:
name: thing
name: thing
namespace: development
resourceVersion: "redacted"
selfLink: /api/v1/namespaces/development/replicationcontrollers/thing
uid: redacted-uid
spec:
replicas: 1
selector:
name: thing
template:
metadata:
creationTimestamp: null
labels:
name: thing
track: build
spec:
containers:
- command:
- fab
- uwsgi
env:
...
image: gcr.io/cyrusmolcloud/thing:build
imagePullPolicy: Always
name: thing
ports:
- containerPort: 80
name: thing-web
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /var/databases
name: databases
- command:
- fab
- runworkers
env:
...
image: gcr.io/cyrusmolcloud/thing:build
imagePullPolicy: Always
name: worker
resources: {}
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- gcePersistentDisk:
fsType: ext4
pdName: databases-us-central-1-kube
readOnly: true
name: databases
status:
fullyLabeledReplicas: 1
observedGeneration: 1
replicas: 1
So, the great thing is that some/all drives are actually mounted to the nodes;
$ gcloud compute instances describe gke-redacted-node-rq7o
canIpForward: true
cpuPlatform: Intel Haswell
creationTimestamp: '2016-05-16T...'
disks:
- autoDelete: true
boot: true
deviceName: persistent-disk-0
index: 0
interface: SCSI
kind: compute#attachedDisk
licenses:
- https://www.googleapis.com/compute/v1/projects/gke-node-images/global/licenses/gke-node
mode: READ_WRITE
source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/gke-redacted-node-rq7o
type: PERSISTENT
- autoDelete: false
boot: false
deviceName: databases-us-central1-b-kube
index: 1
interface: SCSI
kind: compute#attachedDisk
mode: READ_ONLY
source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/databases-us-central1-b-kube
type: PERSISTENT
- autoDelete: false
boot: false
deviceName: databases-us-central-1-kube-1
index: 2
interface: SCSI
kind: compute#attachedDisk
mode: READ_ONLY
source: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/disks/databases-us-central-1-kube-1
type: PERSISTENT
id: 'redacted'
kind: compute#instance
machineType: https://www.googleapis.com/compute/v1/projects/cyrusmolcloud/zones/us-central1-b/machineTypes/n1-standard-16
metadata:
...
I can’t even see any clear patterns in the failures.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 2
- Comments: 15 (5 by maintainers)
I’ve upgraded to 1.3, but I’m still having the issue where ROX volume claims are not mounted properly.
$ kubectl get pv$ kubectl get pvc --namespace=a-namespace$ kubectl get events