containerd: failed to start or create containerd task

Description

Running Kubernetes conformance testing against a cluster with containerd runtime sometimes fails due to a pod not starting during one of the test cases. The general error is failed to start containerd task or failed to create containerd task. More detailed errors include the following:

  • ttrpc: closed: unknown
  • read: connection reset by peer: unknown
  • failed to start io pipe copy: unable to copy pipes: containerd-shim: opening w/o fifo ... failed: context deadline exceeded

Steps to reproduce the issue:

Option 1: Follow https://github.com/cncf/k8s-conformance/blob/master/instructions.md#running to run Kubernetes conformance testing via sonobuoy.

Option 2: Follow https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#running-conformance-tests to run Kubernetes conformance testing via kubetest.

The more load on the cluster (i.e running conformance tests in parallel) makes the problem easier to reproduce. However, the problem is in general difficult to reproduce since the failure rate is low. For example, re-running the conformance tests after a failure is usually successful.

Describe the results you received:

See description.

Describe the results you expected:

Kubernetes conformance test passes because containerd retries the failed task.

Output of containerd --version:

We’ve seen this on various containerd 1.2.x and 1.3.x versions.

Any other relevant information:

We’ve noticed and have been monitoring these failures since October 2019. Although, they could have started long before that.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Hello Also noticed this issue on EKS node:

  Kernel Version:             5.10.178-162.673.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.19
  Kubelet Version:            v1.24.11-eks-a59e1f0
  Kube-Proxy Version:         v1.24.11-eks-a59e1f0

Pod’s status:

Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: failed to create shim task: context canceled: unknown
      Exit Code:    128

Thanks for the feedback! Closing.