containerd: 'failed to reserve container name'
Description
Hi!
We are running containerd on GKE with pretty much all defaults. A dozen nodes, and a few hundreds pods. Plenty of memory and disk free.
We started to have many pods fail due to failed to reserve container name
error in the last week or so. I do not recall any specific changes to the cluster, or containers themselves.
Any help will be greatly appreciated!
Steps to reproduce the issue: I have no clue how to specifically reproduce this issue.
Cluster have nothing special, deployment is straightforward. The only thing that could be relevant is that our images are quite large, around 3Gb.
I got a few more details here : https://serverfault.com/questions/1036683/gke-context-deadline-exceeded-createcontainererror-and-failed-to-reserve-contai
Describe the results you received:
2020-10-07T08:01:45Z Successfully assigned default/apps-abcd-6b6cb5876b-nn9md to gke-bap-mtl-1-preemptible-e2-s4-e6a8ddb4-ng3v I
2020-10-07T08:01:50Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:16:45Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:18:45Z Error: context deadline exceeded W
2020-10-07T08:18:45Z Container image "redis:4.0-alpine" already present on machine I
2020-10-07T08:18:53Z Created container redis I
2020-10-07T08:18:53Z Started container redis I
2020-10-07T08:18:53Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:02Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:02Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:03Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:20Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:20Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:21Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:34Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:34Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:35Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:44Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:44Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:54Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:08Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:08Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:20:18Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:30Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:30Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:21:19Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:26:35Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:31:36Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:36:26Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:41:18Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
I 2020-10-07T08:46:41Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e"
Describe the results you expected: Live an happy life, error free š
Output of containerd --version
:
containerd github.com/containerd/containerd 1.3.2 ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
Any other relevant information:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 28
- Comments: 50 (14 by maintainers)
Links to this issue
Commits related to this issue
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to fuweid/containerd by fuweid 2 years ago
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to fuweid/containerd by fuweid 2 years ago
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to fuweid/containerd by fuweid 2 years ago
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to qiutongs/containerd by fuweid 2 years ago
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to wllenyj/containerd by fuweid 2 years ago
- oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number... — committed to katiewasnothere/containerd by fuweid 2 years ago
Hi Matti, I am from GKE. We are fully aware of this issue and are prioritizing it.
Just had this happen to me on GKE.
Unfortunately, the only working solution is to move back to cos with docker. Amazing to see that this critical bug has been opened more than one year go, and still no fix.
Here is my current analaysis. I will keep updating this comment.
Summary
āfailed to reserve container nameā error is returned by containerd CRI if there is an in-flight
CreateContainer
request reserving the same container name (like below). T1: 1st CreateContainer(XYZ) request is sent. (Timeout on Kubelet side) T2: 2nd CreateContainer(XYZ) request is sent (Kubelet retry) T3: 2nd CreateContainer request returns āfailed to reserve container name XYZā error T4: 1st CreateContainer request is still in-flightā¦It simply indicates the CreateContainer request is slower than configurable
--runtime-request-timeout
(default 2min).Based on my observation and investigation so far, I found following facts.
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpec. Yes, restartPolicy affects the behavior of container creation.)Mitigation
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpecTheory 1
Expected symptom: some pods are up and generate heavy IO but other are not
Docker has a similar mechanism of āreserving container nameā to prevent conflict. However, dockershim handles it in a different way from containerd CRI implementation.
https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/kubelet/dockershim/helpers.go#L284
In my experiment, it keeps hitting the case of ārandomize the container nameā. It indicates every Kubelet retry will try to create a new container name in dockershim. However, containerd is stick to single containerd name where all the subsequent retrys are doomed to fail if the initial request is in-flight.
In conclusion, dockershim has a more aggressive way of retry. Therefore, docker has higher chance of creating container successfully in a much faster way than containerd.
Theory 2
Expected symptom: all the pods are not up.
Containerd has worse pull-image control than docker. For example, it may pull too many images in parallel which generates more disk IO.
(Not found any code reference yet)
Reproducing the Problem
Unfortunately, I havenāt found a way to reproduce that docker is consistently superior to containerd.
Experiment for Theory 1
Setup:
stress-ng
tool.Execution:
Expected Result:
Actual Result:
Experiment for Theory 2
Setup:
Execution:
alpine ubuntu python busybox redis node mysql nginx httpd mongo memcached postgres mariadb wordpress influxdb consul rabbitmq debian amazonlinux cassandra
Expected Result
Actual Result
Need Help from Community
Misc
jotting down some notes here, apologies if itās lengthy:
Let me try to explain/figure out the reason you got āfailed to reserve container nameā ā¦
Kubelet tried to create a container that it had already asked containerd to create at least once⦠when containerd tried the first time it received a variable in the container create meta data named
attempt
and that variable held the default value0
⦠then containerd reserved the unique name for attempt 0 that you see in your log (see _0 at end of name)"web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0"
⦠something happened causing a context timeout between kubelet and containerd ⦠the kubelet context timeout value is configurable⦠āāruntime-request-timeout duration Default:2m0s
ā a 2min timeout could happen for any number of reasons⦠an unusually long garbage collection a file system hiccup, locked files, deadlocks while waiting, some very expensive init operation occurring in the node for one of your other containers⦠who knows? Thatās why we have/need recovery procedures.What should have happened is kubelet shouldāve incremented the
attempt
number (or at least thatās how I see it from this side (the containerd side) of the CRI api, but kubelet did not increment the attempt number and further containerd was still trying to create the container from the first request⦠or the create on the containerd side may even be finished at this point, it is possible the timeout only happened on the kubelet side and containerd continued finishing the create, possibly even attempting to return the success result. If containerd actually failed it would have deleted the reservation for that container id as the immediate thing after we reserve the id in containerd is to defer itās removal on any error in the create⦠https://github.com/containerd/containerd/blob/master/pkg/cri/server/container_create.go#L65-L84So ok⦠skimming over the kubelet code⦠I believe this is the code that decides what attempt number we are on? https://github.com/kubernetes/kubernetes/blame/master/pkg/kubelet/kuberuntime/kuberuntime_container.go#L173-L292
In my skim⦠I think I see a window where kubelet will try attempt 0 a second time after the first create attempt fails with a context timeout. But I may be reading the code wrong? @dims @feiskyer @Random-Liu
On GCP, only for a little while longer, though. Just got an email:
I only reproduce this issue (always) when scaling multiple heavy (both in terms of image size and the processes launched) pods.
Happened to me too on GKE
Same problem here
containerd github.com/containerd/containerd 1.4.6 d71fcd7d8303cbf684402823e425e9dd2e99285d
Amazon EKS 1.21
Bumped into this issue as well. Switching back to cos with docker.
Same for us once we switched back to cos with docker everything worked
Summary (2022/02)
āfailed to reserve container nameā error is returned by containerd CRI if there is an in-flight
CreateContainer
request reserving the same container name (like below). T1: 1st CreateContainer(XYZ) request is sent. (Timeout on Kubelet side) T2: 2nd CreateContainer(XYZ) request is sent (Kubelet retry) T3: 2nd CreateContainer request returns āfailed to reserve container name XYZā error T4: 1st CreateContainer request is still in-flightā¦Donāt panic. Given sufficient time, the container and pod will be created successfully, as long as you are using
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpec.Root Cause and Fix
Slow disk operations((e.g. disk throttle on GKE) are the culprit. What generates lots of disk IO can come from a number of factors: userās disk-heavy workload, big images pulling and containerd CRI implementation.
An unnecessary
sync-fs
operation was found as part ofCreateContainer
stack. It is the whereCreateContainer
gets stuck. Thesync-fs
is got rid of in https://github.com/containerd/containerd/pull/6478. Not only it makesCreateContainer
return faster, but it reduces disk IO generated by containerd.Please note there are perhaps other undiscovered reason contributing to this problem.
Mitigation
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpecAmended Theory 1
(See the original theory 1 in https://github.com/containerd/containerd/issues/4604#issuecomment-1006013231)
Docker has a similar mechanism of āreserving container nameā to prevent conflict. However, dockershim handles it in a different way from containerd CRI implementation.
https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/kubelet/dockershim/helpers.go#L284
In fact, this difference of retry leads to significantly different CRI rates between dockershim and containerd. In containerd, the
CreateContainer
request comes about every 10s-20s (See example in https://github.com/containerd/containerd/issues/4604#issue-716346199). But in dockershim case, theCreateContainer
request comes about every 2min. This is because the requests of hitting āfailed to reserve nameā are fast in containerd while the requests can take 2min with a new container name in dockershim. This applies toRunPodSandbox
as well. Therefore, it is a fact that the load of CRI requests in containerd is 10x of the load in dockershim. And I infer this makes the node further overloaded.This theory echos with a similar bug solved in CRI-O - https://bugzilla.redhat.com/show_bug.cgi?id=1785399., in which the solution says āNow, when systems are under load, CRI-O does everything it can to slow down the Kubelet and reduce load on the system.ā
I believe our direction is also to slow down Kubelet sending too many requests. This might be aligned with Mikeās comment- https://github.com/containerd/containerd/issues/4604#issuecomment-1013268187
Happened to me, itās really a serious bug when you are running your gitlab ci/cd runners in containerd based k8s because some pipelines are designed to run multiple containers in parallel and this bug happens very often. Is going back to docker really the only option here?
@mikebrow I investigated a reported issue in k/k before https://github.com/kubernetes/kubernetes/issues/94085
My summary is that kubelet has correct logic for incrementing the restart number which is set to ācurrent_restart + 1ā. See this kubelet code.
CreateContainer
request eventually succeeds on containerd side, kubelet will see it and increment the restart on the next iteration ofSyncPod
. The pod will be eventually ready.CreateContainer
request eventually fails on containerd side, containerd should release the name. On next iteration of KubeletSyncPod
, it shouldnāt see āfailed to reserve container nameā error.CreateContainer
request is stuck on containerd side, the name is never released. Then kubelet will keep seeing āfailed to reserve container nameā error.@fuweid
Thanks for your time on this issue.
Unfortunately, I did stop using COS back in 2020 after we could not find a solution.
Iām 97% sure we were using
overlayfs
and as for the rest I have no way to find this historical data.Sorry about that.
Hi, @matti and @kubino148 and @sadortun and all subscribers, could you mind to provide the goroutine stack of containerd when you see the error? Thanks.
kill -USR1 $(pidof containerd)
will trigger the dump and check containerd log to get stack.Ran into this on KinD (kindest/node:v1.21, single node) when disk IO was higher than expected during tests+, which I suspect was caused/exacerbated by creating too many pods at once. Creating fewer pods at once still didnāt work at first, but restarting containerd and kubelet (in that order) caused those few pods to come up as expected. I was then able to slowly scale all of the test pods back up to their expected replica counts without a problem. My guess is that once this error occurs, kubelet and containerd are āstuckā but restarting them appears to āun-stuckā them. No idea if this has any applicability to an actual production environment.
This also happens with UBUNTU_CONTAINERD, not just COS_CONTAINERD
I confirm that moving back to docker solves the problem:
gcloud container clusters upgrade mycluster --image-type cos --node-pool mynodepool --zone myzone