kubernetes: When a Pod with a PV is moved to another node stuck in ContainerCreating a long time
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened:
When i move a Pod with the expression “nodeSelector:” to another Node of the cluster Kubernetes the Pod waiting 8 minutes in the “ContainerCreating” status.
ERRORs:
Warning FailedAttachVolume Multi-Attach error for volume "pvc-7ec40eec-949e-11e7-b96d-fa163ef575ff" Volume is already exclusively attached to one node and can't be attached to another
Multi-Attach error for volume "pvc-7ec40eec-949e-11e7-b96d-fa163ef575ff" (UniqueName: "kubernetes.io/cinder/ab54e390-cace-466f-8624-bdb270fa49ff") from node "knode3" Volume is already exclusively attached to one node and can't be attached to another
After 6 minutes the OpenStack Cinder Volume is attached to the selected node, and the Pod is inicialized. For an application this behavior is very time.
What you expected to happen:
It is expected that after the order of the movement of the POD to another node, the Cinder Volume will be moved to the selected node and Pod start quickly.
How to reproduce it (as minimally and precisely as possible):
Move a Pod with a Persistent Volume (OpenStack Cinder) to another node of the Kubernetes cluster.
Anything else we need to know?:
Log file kubelet: kubelet.txt
Log file kube-controller-manager: kube-controller-manager.txt
Environment:
-
Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“5”, GitVersion:“v1.5.2”, GitCommit:“269f928217957e7126dc87e6adfa82242bfe5b1e”, GitTreeState:“clean”, BuildDate:“2017-07-03T15:31:10Z”, GoVersion:“go1.7.4”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.5”, GitCommit:“17d7182a7ccbb167074be7a87f0a68bd00d58d97”, GitTreeState:“clean”, BuildDate:“2017-08-31T08:56:23Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration**: Openstack Mitaka
-
OS (e.g. from /etc/os-release): NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos”
-
Kernel (e.g.
uname -a):
Linux knode2 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 24
- Comments: 59 (21 by maintainers)
The lesson learned here: Don’t use kubernetes for database.
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
I have similar issue in DigitalOcean. If pod is scheduled for deployment on another node it will break as current node+pod are already linked and old pod will not detach before new one is attached.
FIX attempt 1: Add
RollingUpdatemaxUnavailable: 100%--> FAILED FIX attempt 2: Add FIX1 + add affinity to deploy pod only to one node --> SUCCESSThis means that you will have service for few seconds offline and you will not be able to use cluster nor to scale service to different nodes.
DigitalOcean volumes support only ReadWriteOnce as many others. That means that we need to find some better solution as deployment to one node and accepting downtime is not what Kubernetes is and it heavily undermines entire idea of persistent volumes.
Version:
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}Stale issues rot after 30d of inactivity. Mark the issue as fresh with
/remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.If this issue is safe to close now please do so with
/close.Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Issues go stale after 90d of inactivity. Mark the issue as fresh with
/remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.If this issue is safe to close now please do so with
/close.Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
I’ve got the same issue with k8s 1.11.0 and ceph using dynamic provisioning. This issue also occures when I do a
kubectl apply -f deployment.ymlAs such it’s not possible to modify something without redeploying using delete/apply… 😦 (For me it took much longer than 6min)
@fejta-bot: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
imo this “bug” exist in all volumetypes. If you have pod with pvc(any type, RWX types excluded) running in node1. You will shutdown that node1 -> the pod will start again in some another node but failovering(it will return that multi-attach error) volumes takes 6-10minutes because it will wait force detach.
Options:
I have the same issue on k8s 1.17.2, rook-ceph as storage. One worker node getting turned off, pod is trying to be evicted after 5 minutes, but can not start because “Volume is exclusively used …by the old pod”. Old pod is getiing stuck in “Terminating”. Workaround: kill the old pod, kill the new pod, wait until the new pod is still unable to start, kill the new pod again. Pretty weak for a cluster solution.
@rootfs @jsafrane @thockin do you have guys idea how we could improve this situation? This volume mount problem has been problem for a long time. I have tried to solve this twice, but always storage or node sigs are saying that my solution is incorrect.
We have customer who is using cronjobs each 5 minute, and they do have volume in it as well. Well, you can imagine what will happen when you will ask volumes to mount every 5 minute, force detach time is 6 minutes. I think we can modify force detach time in cluster, but still it does not remove this problem. It seems that this volume mount problem is in all cloudproviders, sometimes it takes 5-20minutes to get volume in place. 20minutes is quite huge time if your application is running production.
edit: there is another issue for this #65392 (it might solve some of these issues)
Got the same error. We modified the resource request/limits for one statefulset with 3 replicas. K8s moved one of the replicas to another node, which has enough resources, but the volume was still attached to the old node.
K8s version: v1.8.1+coreos.0 Running on AWS
Warning FailedAttachVolume 7m (x2987 over 12m) attachdetach Multi-Attach error for volume “pvc-4fe430e8-db4d-11e7-9931-02138f142c30” Volume is already exclusively attached to one node and can’t be attached to another
What is the status of this on AWS/EBS? I’have the same problem on AWS with v1.9.3
Having the same issue on DitigalOcean, there’re two things invovled
RollingUpdatevsRecreate. Obviously for zero down timeRollingUpdateis preferred. It keeps old pod before new pod is ready. Here comes the problem: the new pod will fail saying “Multi-Attach error for volume “pvc-xxx” Volume is already used by pod(s) xxx”. Changing toRecreateseem to eliminate this error - make sense, it destroys old pod first, leaving some down time, but it ensures volume is completely detached, then new pod schedule & attach volume. Not sure if @MichaelOrtho 's FIX1 equals toRecreate. But like @MichaelOrtho said, this beats one of k8s main purpose of zero downtime. What I see ideally is that withRollingUpdate, k8s should be able to transfer the volume attachment from old pod to new pod. Is this a bug, or it’s just not possible and it’s an expected limitation on k8s’sRollingUpdate?ReadWriteOncethat only allows volume mount to one node. This error occurs even if your update strategy isRecreate. Current workaround is like @MichaelOrtho mentioned, add affinity to ensure scheduling on one same node. The question is, is this a bug for k8s, at least forRecreate, can k8s detach volume from one node/old pod, and attach it to another node/new pod?@adipascu and other people on StackOverflow mentioned
StatefulSetfor stateful app, haven’t tried it yet.If the above are not considered as k8s bug, on a user/developer experience perspective, I really think maybe we should disable PVC support on Deployment completely, or if PVC used, by default configure (or enforce)
Recreateand affinity for the user, or at least highlight this in Deployment’s documentation and guide people to useStatefulSet, since user will absolutely hit the wall when using PVC on Deployment./reopen /remove-lifecycle rotten
I am still having this exact issue on version: v1.12.8 on Google Kubernetes Engine. It happens to me when I run
kubectl apply -f app.yamland make a pod recreate itself. My current fix is to runk delete -f app.yamlbefore to release the disk and to wait a bit before recreating the pod.How is this still not fixed? Am I using Kubernetes incorrectly?
Edit: I think
StatefulSetshould solve this issue.Just experienced the same on Digital Ocean. The pod is still in ContainerCreating after 13 mins…
postgres-deployment-77c874df64-k4hn9 0/1 ContainerCreating 0 13m
We are facing the same issue with k8s 1.9.8 and rbd volumes. But in our case the pod was just redeployed on another node due to changes via
kubectl edit deployment ...I think I just experienced the same issue on AWS …