kubernetes: Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-07T23:53:09Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-07T23:34:32Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): container-os
Kernel (e.g. uname -a): Linux gke-wordpress-cluster-default-pool-b41e0322-m764 4.4.21+ #1 SMP Fri Feb 17 15:34:45 PST 2017 x86_64 Intel® Xeon® CPU @ 2.60GHz GenuineIntel GNU/Linux
Install tools:
Others:

What happened:

Warning		FailedMount	Unable to mount volumes for pod "wordpress-4199438522-50xjb_default(5603b982-0ef2-11e7-9fd7-42010a80002d)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-4199438522-50xjb". list of unattached/unmounted volumes=[wordpress-persistent-storage]
  50s		50s		1	{kubelet gke-wordpress-cluster-default-pool-b41e0322-m764}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-4199438522-50xjb". list of unattached/unmounted volumes=[wordpress-persistent-storage]

I was able to bring up wordpress okay the first time, except GKE wasn’t creating loadbalancer ips due to a quota issue, which I resolved, note at this point the mysql pod was up and had attached to it’s volume. Upon deleting the wordpress deployment and creating it again I started getting the above errors. I deleted the mysql pod as well and brought it up again to see that it had the same issue.

The volumes are backed by a gluster cluster on GCE. Looking at the brick logs on one of the gluster nodes I see

[2017-03-22 09:42:14.354542] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-gluster_vol-1-server: accepted client from gluster-1-7439-2017/03/22-09:42:10:325146
-gluster_vol-1-client-0-0-0 (version: 3.7.6)
[2017-03-22 09:42:46.355221] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-gluster_vol-1-server: accepted client from gke-wordpress-cluster-default-pool-b41e03
22-m764-2447-2017/03/22-09:42:46:301893-gluster_vol-1-client-0-0-0 (version: 3.7.6)
[2017-03-22 09:42:57.316248] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-gluster_vol-1-server: accepted client from gke-wordpress-cluster-default-pool-b41e03
22-m764-2730-2017/03/22-09:42:57:272881-gluster_vol-1-client-0-0-0 (version: 3.7.6)
[2017-03-22 10:03:29.117920] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-gluster_vol-1-server: disconnecting connection from gke-wordpress-cluster-default-pool-b41e03
22-m764-2730-2017/03/22-09:42:57:272881-gluster_vol-1-client-0-0-0
[2017-03-22 10:03:29.117984] I [MSGID: 101055] [client_t.c:419:gf_client_unref] 0-gluster_vol-1-server: Shutting down connection gke-wordpress-cluster-default-pool-b41e0322-m7
64-2730-2017/03/22-09:42:57:272881-gluster_vol-1-client-0-0-0
[2017-03-22 10:45:53.074843] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-gluster_vol-1-server: disconnecting connection from gke-wordpress-cluster-default-pool-b41e03
22-m764-2447-2017/03/22-09:42:46:301893-gluster_vol-1-client-0-0-0
[2017-03-22 10:45:53.074905] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-gluster_vol-1-server: fd cleanup on /mysql/ib_logfile1
[2017-03-22 10:45:53.074942] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-gluster_vol-1-server: fd cleanup on /mysql/ib_logfile0
[2017-03-22 10:45:53.074997] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-gluster_vol-1-server: fd cleanup on /mysql/ibdata1
[2017-03-22 10:45:53.075112] I [MSGID: 101055] [client_t.c:419:gf_client_unref] 0-gluster_vol-1-server: Shutting down connection gke-wordpress-cluster-default-pool-b41e0322-m7
64-2447-2017/03/22-09:42:46:301893-gluster_vol-1-client-0-0-0

I’ve tried restarting kubelet on the node. I can’t find the kubelet log file on any of gke nodes. And I don’t know how to get the kube-controller and apiserver logs from the master (GKE).

I suspect it’s a failing of the glusterfs client on the gke nodes?

What you expected to happen: Deployment to mount the volumes successfully.

How to reproduce it (as minimally and precisely as possible): Not sure how, but I’ve run into this intermittently

Anything else we need to know: The volumes are backed by gluster

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 47 (29 by maintainers)

Commits related to this issue

Merge pull request #44781 from wongma7/outervolumespec Automatic merge from submit-queue (batch tested with PRs 45382, 45384, 44781, 45333, 45543) Ensure desired state of world populator runs before... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #61357 from rphillips/fixes/add_udev_to_hyperkube Automatic merge from submit-queue (batch tested with PRs 61848, 61188, 56363, 61357, 61838). If you want to cherry-pick this chang... — committed to kubernetes/kubernetes by deleted user 6 years ago

Most upvoted comments

@gnufied @jingxu97

I found the issue, and it was the fact that i used c5 instances with nvme, this is why the volume never been attached as k8s looking for /dev/xvdXX and with nvme that path is different. I did notice that in 1.9 k8s should support nvme natively but it seems that it’s still not stable.

thanks

+19

racyber on Jan 11, 2018

Running on GKE. I’d rather not launch my product with this bug around. Is there a workaround for it or something I can monitor for? Or maybe it could get hot fixed on GKE? If there is no workaround, this effectively renders GKE useless for stateful stuff.

discordianfish on May 11, 2017

I think I am seeing like this with c4.xlarge, but unsure of “why” at the moment. It would seem r3.large works perfectly fine.

srflaxu40 on Feb 1, 2018

You don’t need to reboot the node. Delete the pod and restart it should work for this issue. Is this workaround ok for your case?

On Fri, May 12, 2017 at 10:48 AM, discordianfish notifications@github.com wrote:

@wongma7 https://github.com/wongma7 Well, I only can say that my volumes were dropped and it didn’t recover on it’s own (only after reboot).

Dropped a seen by kubernetes, dunno if the volumes were still attached but they weren’t usable by my scheduled database. To be clear: I’m not “complaining” and this was just test/R&D, but it would this is the scariest kind of issue for me to happen in prod, given that it’s a managed service where I can’t find a workaround on my own. Rebooting fixed it in my case, but that’s a big hammer…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-301142082, or mute the thread https://github.com/notifications/unsubscribe-auth/ASSNxfpLzgMumu75uiDtZcPxahpxqPKgks5r5JtkgaJpZM4MlFrS .

–

Jing

jingxu97 on May 12, 2017

@wongma7 Well, I only can say that my volumes were dropped and it didn’t recover on it’s own (only after reboot).

Dropped a seen by kubernetes, dunno if the volumes were still attached but they weren’t usable by my scheduled database. To be clear: I’m not “complaining” and this was just test/R&D, but it would this is the scariest kind of issue for me to happen in prod, given that it’s a managed service where I can’t find a workaround on my own. Rebooting fixed it in my case, but that’s a big hammer…

discordianfish on May 12, 2017