kubernetes: VerifyVolumesAreAttached is failing and looping on remounts for openstack cinder

Is this a request for help?: Not necessarily, things appear to be working

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): #28962 looks similar


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T07:31:07Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: OpenStack with CinderV1 API
  • OS (e.g. from /etc/os-release): Container Linux by CoreOS 1235.6.0
  • Kernel (e.g. uname -a): Linux kube-master-02.openstacklocal 4.7.3-coreos-r2 #1 SMP Sun Jan 8 00:32:25 UTC 2017 x86_64 Intel Xeon E312xx (Sandy Bridge) GenuineIntel GNU/Linux
  • Install tools: Kargo
  • Others:

What happened: I was following the guide for a StatefulSet zookeeper install and things appear to have booted up, the pods are stable, but I see errors in the UI about Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"zk-0". list of unattached/unmounted volumes=[datadir]. This led me to check the hyperkube controller-manager’s logs and I see a spewing of logs related to remounting the ZK volumes:

I0115 21:46:51.541418       1 node_status_updater.go:135] Updating status for node "kube-node-04" succeeded. patchBytes: "{\"status\":{\"volumesAttached\":[{\"devicePath\":\"/dev/vdb\",\"name\":\"kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb\"}]}}" VolumesAttached: [{kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb /dev/vdb}]
I0115 21:46:55.566696       1 attacher.go:140] VolumesAreAttached: check volume "efeddaeb-ed27-4e92-9733-f46251cee3cb" (specName: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") is no longer attached
I0115 21:46:55.566758       1 operation_executor.go:565] VerifyVolumesAreAttached determined volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" (spec.Name: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") is no longer attached to node %!q(MISSING), therefore it was marked as detached.
I0115 21:46:55.610216       1 reconciler.go:213] Started AttachVolume for volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" to node "kube-node-04"
I0115 21:46:56.180376       1 attacher.go:92] Attach operation is successful. volume "efeddaeb-ed27-4e92-9733-f46251cee3cb" is already attached to node "a1c515d3-2f79-4067-a7c7-e7cac564a60b".
I0115 21:46:56.508940       1 operation_executor.go:620] AttachVolume.Attach succeeded for volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" (spec.Name: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") from node "kube-node-04".
I0115 21:46:56.721459       1 node_status_updater.go:135] Updating status for node "kube-node-04" succeeded. patchBytes: "{\"status\":{\"volumesAttached\":[{\"devicePath\":\"/dev/vdb\",\"name\":\"kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb\"}]}}" VolumesAttached: [{kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb /dev/vdb}]
I0115 21:47:00.617475       1 attacher.go:140] VolumesAreAttached: check volume "efeddaeb-ed27-4e92-9733-f46251cee3cb" (specName: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") is no longer attached
I0115 21:47:00.617563       1 operation_executor.go:565] VerifyVolumesAreAttached determined volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" (spec.Name: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") is no longer attached to node %!q(MISSING), therefore it was marked as detached.
I0115 21:47:00.835167       1 reconciler.go:213] Started AttachVolume for volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" to node "kube-node-04"
I0115 21:47:01.457503       1 attacher.go:92] Attach operation is successful. volume "efeddaeb-ed27-4e92-9733-f46251cee3cb" is already attached to node "a1c515d3-2f79-4067-a7c7-e7cac564a60b".
I0115 21:47:01.743547       1 operation_executor.go:620] AttachVolume.Attach succeeded for volume "kubernetes.io/cinder/efeddaeb-ed27-4e92-9733-f46251cee3cb" (spec.Name: "pvc-dc5e1785-dafe-11e6-b025-fa163e0158f7") from node "kube-node-04".

What you expected to happen: Things should be quiet, the kube cluster should not be retrying the mount. Although it appears to be a NOOP it is noisy and I fear it will add load to our openstack cluster with all the API calls.

How to reproduce it (as minimally and precisely as possible):

Try the zookeeper config on an openstack cluster. In my case, I had to create my own storage class (for example, a ceph-backed storage system) and then update the YAML to go to the ceph storage class:

kubectl create -f http://k8s.io/docs/tutorials/stateful-application/zookeeper.yaml on a openstack cluster.

Anything else do we need to know:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

This is fixed in - https://github.com/kubernetes/kubernetes/pull/39998 , we need to cherry pick this to release-1.5 branch. @xrl if you don’t mind you can cherry pick the commit to release-1.5 branch and open a PR. You can run:

GITHUB_USER=gnufied ./hack/cherry_pick_pull.sh upstream/release-1.4 41455

Replace you github id, branch name and PR number.

I think this is similar to what we discovered https://github.com/kubernetes/kubernetes/pull/39551/files . By default kubernetes verifies if volumes are indeed attached to nodes every 5second. As a workaround - you can increase the time duration.

But increasing the polling duration should be done with some caution. cc @jingxu97 @chrislovecnm