kubernetes: EBS detach fails and volume remains busy - v1.5.0-beta.2
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
- No
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
- ebs
- ebs detach
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-beta.2", GitCommit:"0776eab45fe28f02bbeac0f05ae1a203051a21eb", GitTreeState:"clean", BuildDate:"2016-11-24T22:35:03Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-beta.2", GitCommit:"0776eab45fe28f02bbeac0f05ae1a203051a21eb", GitTreeState:"clean", BuildDate:"2016-11-24T22:30:23Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): Ubuntu 16.04.1 LTS
- Kernel (e.g.
uname -a
):
Linux kubecontroller-1 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: None
- Others: None
What happened:
I’ve seen this behavior in 1.4.6 and earlier. When a pod with a PersistentVolumeClaim
, in this case when using a StorageClass
, is “moved” to another node (terminated on one node then run on the other), the persistent volume occasionally has trouble detaching to the old node. The move can be either from kubectl apply -f <new_configuration.yml>
or in this particular example, when draining a node, with a command such as:
kubectl drain worker-3
This does sometimes work. Even when it works, there are plenty of errors with the EBS volume. The kube-controller will throw lots of errors, saying it can’t detach. Such as:
Dec 01 17:41:47 kubecontroller-1 kube-controller-manager[5662]: E1201 17:41:47.344806 5662 attacher.go:73] Error attaching volume "aws://us-east-1c/vol-802faf11": Error attaching EBS volume "vol-802faf11" to instance "i-ca043f59": VolumeInUse: vol-802faf11 is already attached to an instance
Dec 01 17:41:47 kubecontroller-1 kube-controller-manager[5662]: status code: 400, request id:
There will be other nestedpendingoperations.go
operation errors. I’m not sure if that’s a symptom of a misconfiguration. Again, it appears to SOMETIMES work. IF/WHEN it works, I’ll see this in the kube-controller log:
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: I1201 17:42:30.821032 5662 aws.go:1492] AttachVolume volume="vol-802faf11" instance="i-ca043f59" request returned {
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: AttachTime: 2016-12-01 17:42:30.662 +0000 UTC,
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: Device: "/dev/xvdba",
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: InstanceId: "i-ca043f59",
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: State: "attaching",
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: VolumeId: "vol-802faf11"
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: }
Dec 01 17:42:30 kubecontroller-1 kube-controller-manager[5662]: I1201 17:42:30.920339 5662 aws.go:1366] Waiting for volume "vol-802faf11" state: actual=attaching, desired=attached
Dec 01 17:42:41 kubecontroller-1 kube-controller-manager[5662]: I1201 17:42:41.019194 5662 aws.go:1265] Releasing in-process attachment entry: ba -> volume vol-802faf11
But it will occasionally fail and it will never detach. On the kubelet server, this log appears:
Dec 01 19:38:55 worker-3 kubelet[1032]: I1201 19:38:55.432000 1032 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-802faf11" (spec.Name: "ebstest-volume") from pod "fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336" (UID: "fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336").
Dec 01 19:38:55 worker-3 kubelet[1032]: I1201 19:38:55.432070 1032 aws_ebs.go:398] Error checking if mountpoint /var/lib/kubelet/pods/fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336/volumes/kubernetes.io~aws-ebs/pvc-bfd01b97-b7d2-11e6-8057-0e71c9ba25de: stat /var/lib/kubelet/pods/fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336/volumes/kubernetes.io~aws-ebs/pvc-bfd01b97-b7d2-11e6-8057-0e71c9ba25de: no such file or directory
Dec 01 19:38:55 worker-3 kubelet[1032]: E1201 19:38:55.432132 1032 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-802faf11\" (\"fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336\")" failed. No retries permitted until 2016-12-01 19:40:55.432095954 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-802faf11" (volume.spec.Name: "ebstest-volume") pod "fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336" (UID: "fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336") with: stat /var/lib/kubelet/pods/fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336/volumes/kubernetes.io~aws-ebs/pvc-bfd01b97-b7d2-11e6-8057-0e71c9ba25de: no such file or directory
And it will repeat. If I try to look in the path specified above on the particular kubelet, /var/lib/kubelet/pods/fff0ca70-b7ee-11e6-a0c7-0e82f0b9f336/volumes/kubernetes.io~aws-ebs/
exists, but the pvc-*
does not.
On the kubelet I can actually see the mounted volume. Again, sometimes it is successful and the pod moves to another node. sometimes not.
What you expected to happen:
Upon kubectl drain worker-3
, the following actions should occur:
- pod terminated.
- EBS persistent volume unmounted.
- EBS persistent volume mounted on another node.
- pod created and successfully run on another node where the EBS volume is.
How to reproduce it (as minimally and precisely as possible):
- Create a storage class for an EBS volume, persistent volume claim, and deployment that has a pod that uses the EBS volume.
- Drain the particular node. Sometimes this will work as intended.
Anything else do we need to know:
- I can provide the yml files as necessary.
- There is a second EBS volume, mounted at
/dev/xvde
, mounted at/mnt/ebs
. This is because the standard AMI root drive is very small. The Docker path is there and the kubelet is symlinked there/var/lib/kubelet -> /mnt/ebs/kubelet
. I will test this deployment WITHOUT the EBS volume and symlinks to verify.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 33 (18 by maintainers)
I’ve been repeatedly testing this one. This might be a case of if there is a non-Kubernetes managed EBS volume mounted to the instance, it can cause issues. I have been testing with all the workers having ONLY a root volume (no
/mnt/ebs
as mentioned in the original issue). So far, no Kubernetes EBS volumes getting stuck in detaching.Will continue to test.