containerd: failed to start or create containerd task
Description
Running Kubernetes conformance testing against a cluster with containerd runtime sometimes fails due to a pod not starting during one of the test cases. The general error is failed to start containerd task
or failed to create containerd task
. More detailed errors include the following:
ttrpc: closed: unknown
read: connection reset by peer: unknown
failed to start io pipe copy: unable to copy pipes: containerd-shim: opening w/o fifo ... failed: context deadline exceeded
Steps to reproduce the issue:
Option 1: Follow https://github.com/cncf/k8s-conformance/blob/master/instructions.md#running to run Kubernetes conformance testing via sonobuoy
.
Option 2: Follow https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#running-conformance-tests to run Kubernetes conformance testing via kubetest
.
The more load on the cluster (i.e running conformance tests in parallel) makes the problem easier to reproduce. However, the problem is in general difficult to reproduce since the failure rate is low. For example, re-running the conformance tests after a failure is usually successful.
Describe the results you received:
See description.
Describe the results you expected:
Kubernetes conformance test passes because containerd retries the failed task.
Output of containerd --version
:
We’ve seen this on various containerd 1.2.x and 1.3.x versions.
Any other relevant information:
We’ve noticed and have been monitoring these failures since October 2019. Although, they could have started long before that.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (14 by maintainers)
Commits related to this issue
- Int tests: Warn (instead of erroring) upon pod restarts, part two In #4595 we stopped failing integration tests whenever a pod restarted just once, which is being caused by containerd/containerd#4068... — committed to linkerd/linkerd2 by alpeb 4 years ago
- Int tests: Warn (instead of erroring) upon pod restarts, part two (#4637) In #4595 we stopped failing integration tests whenever a pod restarted just once, which is being caused by containerd/contai... — committed to linkerd/linkerd2 by alpeb 4 years ago
- test: non-fatal containerd task issue https://github.com/containerd/containerd/issues/4068 caused a container start to fail and get retried, which then broke tests because of our "no container restar... — committed to pohly/pmem-CSI by pohly 4 years ago
Hello Also noticed this issue on EKS node:
Pod’s status:
Thanks for the feedback! Closing.
tracking on the Kubernetes side in https://github.com/kubernetes/kubernetes/issues/89064