kubernetes: Kubernetes keeps failing at mounting a volume

Hey guys,

I have been trying out k8s over the weekend and liking it very much. However, I have one big issue. Very regularly, my nodes on GKE decide to die on me, which wouldn’t be the end of the world since k8s is great at bringing up a new node and shuffling around nodes. However, I’m using postgres with persistent volumes. Every so often, that will fail as presumably an unmount/detach of the disk from one physical node to the other doesn’t go perfectly. That’s at least my newbie suspicion.

The concrete error messages i get are

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "postgres

What am I missing?

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 6
Comments: 81 (31 by maintainers)

Most upvoted comments

I’m experiencing the same issue first on version 1.6.2 and now it’s still happening after I’ve upgraded to version 1.6.4 (running on GKE). This situation is happening daily now and apparently I’m not the only one having the problem. Can’t we reopen this issue? It’s a little strange to me it’s closed so it definitely isn’t solved yet.

@jingxu97 where you able to figure out the root cause and perhaps a possible solution? Is there any way I can help?

+11

niels-s on Jun 16, 2017

Confirmed 1.2.5 resolves the issue. @saad-ali I’m happy to spin up another cluster to attempt to recreate the issue.

rosskukulinski on Jul 25, 2016

@jingxu97 I also have a GKE cluster that has the same issue: Version:

Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7", GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", GitTreeState:"clean", BuildDate:"2016-12-10T04:43:42Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

kubelet log:

Jan 30 10:11:03 kubelet[8847]: E0130 10:11:03.634337    8847 kubelet.go:1819] Unable to mount volumes for pod "federation-apiserver-2942415793-zt37d_federation(19d51f12-d366-11e6-8ae7-42010a840082)": timeout expired waiting for volumes to attach/mount for pod "federation-apiserver-2942415793-zt37d"/"federation". list of unattached/unmounted volumes=[etcd-data]; skipping pod
Jan 30 10:11:03 kubelet[8847]: E0130 10:11:03.634385    8847 pod_workers.go:184] Error syncing pod 19d51f12-d366-11e6-8ae7-42010a840082, skipping: timeout expired waiting for volumes to attach/mount for pod "federation-apiserver-2942415793-zt37d"/"federation". list of unattached/unmounted volumes=[etcd-data]

The PD is attached to the node and I can manually mount and write to it.

tommyulfsparre on Jan 30, 2017

Hey people, I am experiencing this very same issue on Kubernetes cluster at GKE v1.4.7 and wondering how exactly to debug. I’ve got the kubelet.log file.

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "my-pod"/"default". list of unattached/unmounted volumes=[my-pvc default-token-xuwj3]

I have already restarted the cluster anew and it starts failing to mount volumes. Once the first volume mount fails, all other pods (if restarted) fail as well.

Thanks!

simonfan on Jan 17, 2017

quick fix is to remove partition: 1 from your deployment and let kubelet does the formatting for you.

rootfs on Nov 28, 2016

Thanks for the info.

I first noticed the problems on GKE 1.3.2, where the default-token suddenly could not be mounted anymore

“The default-token suddenly could not be mounted anymore” issue is tracked by https://github.com/kubernetes/kubernetes/issues/28750#issuecomment-231814056 and should be fixed (or at least much less likely to cause problems) in v1.3.4.

Then all my persistent disks could not be mounted

This is the issue that I’m really interested in.

but if I can find the time, I will try to isolate the problems and send some logs.

That would be great!

@saad-ali I’m happy to spin up another cluster to attempt to recreate the issue.

@rosskukulinski That would be great. If you get a repro. Pls share the logs I requested above.

saad-ali on Jul 25, 2016

Downgrading to 1.2.5 fixed my issues. I first noticed the problems on GKE 1.3.2, where the default-token suddenly could not be mounted anymore. Later I deleted the whole pool and recreated a new one. Then all my persistent disks could not be mounted, but default-token was working again. Downgraded to 1.3.0 and issue still exists, downgrading to 1.2.5 works. I can’t promise, but if I can find the time, I will try to isolate the problems and send some logs.

cvle on Jul 25, 2016