azure-container-networking: FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox
What happened:
CNI failed to set up network:
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:36:50 +0000 UTC - event for webserver-deployment-5575796f8f-k92dd: {kubelet 2884k8s001} Started: Started container httpd
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:37:08 +0000 UTC - event for webserver-deployment-5575796f8f-9jj9z: {kubelet 2884k8s000} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "678fa65619f0dd4c0434d2b4e510bd99930bc62e8ea4b64ea8a5f79a73b20f6a": unexpected end of JSON input
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:37:10 +0000 UTC - event for webserver-deployment-5575796f8f-6rbcl: {kubelet 2884k8s000} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cb4c4d0e705d57ac9da1eae9a42e2bb421dca03484fa8e0260672425feb55f15": unexpected end of JSON input
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:37:10 +0000 UTC - event for webserver-deployment-5575796f8f-dtc4x: {kubelet 2884k8s000} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a20980651ba231f652be02984d594aa661c613d440b8dd740d4073f2bfa4d50b": unexpected end of JSON input
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:37:10 +0000 UTC - event for webserver-deployment-5575796f8f-jjdlr: {kubelet 2884k8s000} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ba151015cc98dd643d424bf2eafaa71855fb4fb6b75fbb24bb4ea2616f964faf": unexpected end of JSON input
Jan 29 06:41:25.617: INFO: At 2021-01-29 06:37:10 +0000 UTC - event for webserver-deployment-5575796f8f-pst6x: {kubelet 2884k8s000} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5f40817e69457ddc2b907adfd182969bea93f24e337c84675d578aaeb17d2eb1": unexpected end of JSON input
Containerd failed to create the pod:
E0129 06:38:50.213490 4316 remote_runtime.go:251] StartContainer "1054e8a1180749dbb37097a5f985003ca3afb2036099f5df05020a5a9053a4ac" from runtime service failed: rpc error: code = Unknown desc = failed to create containerd task: hcsshim::CreateComputeSystem 1054e8a1180749dbb37097a5f985003ca3afb2036099f5df05020a5a9053a4ac: context deadline exceeded
(extra info: {"Owner":"containerd-shim-runhcs-v1.exe","SchemaVersion":{"Major":2,"Minor":1},"Container":{"GuestOs":{"HostName":"pod-adoption"},"Storage":{"Layers":[{"Id":"7b371bd9-de2e-5dc3-ae86-4d3d0378f6c6","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\113"},{"Id":"d83ac863-b163-5ceb-95c5-ac65bf7a39e3","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\112"},{"Id":"fd86a0ef-4ae8-5be6-b6bd-9e46685ad393","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\110"},{"Id":"9b944c46-c035-5757-9412-5fecc883941a","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\109"},{"Id":"9dee6370-72a7-5b65-af00-e08eeeaa1ef7","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\108"},{"Id":"77c8d266-544f-5418-b4e9-f49715060db9","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\107"},{"Id":"fa4d9b3a-212e-5b70-a9e6-bb1412fd1e53","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\106"},{"Id":"cc368b4b-70a7-5da3-aff7-b91d843db78a","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\105"},{"Id":"b7023ad6-4285-5a05-8998-e1c9c6bc6e49","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\104"},{"Id":"9c171b78-a39b-5e34-873e-04ab1b164bb7","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\79"},{"Id":"75b3f965-edc8-5443-ac0b-77f329589f05","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\78"},{"Id":"bc445432-f548-5054-b865-43ca4d3a32af","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\77"},{"Id":"e1a4a38f-7885-56a7-898b-96c1ceaf5cd5","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\76"},{"Id":"51c30dea-6447-5859-a549-0801b0fbf76c","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\75"},{"Id":"defbcf4b-cdf1-5e89-8198-0fe14410cffc","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\74"},{"Id":"b4e8aa06-2c0c-5738-80d8-af374eba772e","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\73"},{"Id":"59bb3c04-8829-58cb-ab1c-0aef684466cf","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\72"},{"Id":"a2ee96d5-66f6-5a9b-b0d4-384112c9e269","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\71"},{"Id":"2f347625-13a3-5633-badb-6868ea928ff9","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\70"},{"Id":"b03a06fa-412f-5917-a1f1-657de8a3bcc3","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\69"},{"Id":"83621030-7522-570b-ab7c-cf9c5898f09c","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\62"}],"Path":"\\\\?\\Volume{f8cd70db-970c-40de-aaaf-c07998d35301}\\"},"MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8cacd155-a5f4-4c8a-b73c-9d3eb47469be\\containers\\pod-adoption\\c6e401e7","ContainerPath":"c:\\dev\\termination-log"},{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8cacd155-a5f4-4c8a-b73c-9d3eb47469be\\volumes\\kubernetes.io~secret\\default-token-tmjhm","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true}],"Networking":{"Namespace":"36666c6e-cdde-4370-bfd1-a36911baa06a"}},"ShouldTerminateOnLastHandleClosed":true}): unknown
K8s reference issue: https://github.com/kubernetes/kubernetes/issues/98574
What you expected to happen:
The pods are created and network is set up
How to reproduce it:
We use the following aks-engine template:
https://raw.githubusercontent.com/kubernetes-sigs/windows-testing/master/job-templates/kubernetes_containerd_master.json
Orchestrator and Version (e.g. Kubernetes, Docker):
kubernetes 1.21 alpha
Containerd 1.4.3
Operating System (Linux/Windows):
Windows
Kernel (e.g. uanme -a
for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion
for Windows):
Microsoft Windows NT 10.0.17763.0
Source Description HotFixID InstalledBy InstalledOn
2884k8s000 Update KB4578966 10/13/2020 12:00:00 AM
2884k8s000 Security Update KB4577667 10/13/2020 12:00:00 AM
2884k8s000 Security Update KB4586793 2884k8s000\azureuser 11/12/2020 12:00:00 AM
Anything else we need to know?: [Miscellaneous information that will assist in solving the issue.]
This appears to be a transient issue. The tests pass regularly: https://testgrid.k8s.io/sig-windows-releases#aks-engine-azure-windows-master-containerd
Unfortunately the logs for the cni with containerd end up in the wrong place, We made a fix to collect these but it hasn’t been rolled out yet.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 36 (16 by maintainers)
Fix is already included in all Windows Server 2022 builds. Planned ETA for Windows Server 2019 fix is July 19th update, but could slip out (tentative date).
this is still occurring regularly: https://github.com/kubernetes/kubernetes/issues/98574#issuecomment-1033114963
I managed to solve this, the problem for me was that I had both the calico CNI and linkerd at the same time. The main problem was that linkerd started before calico did and even if it came online the CNI didn’t understand that and pod restarts didn’t solve the issue.
In our case we added a init container to linkerd that waited for the calico CNI to add it’s conflist before linkerd could start. https://github.com/linkerd/linkerd2/pull/5916
In short this error can mean many things but I hope this is a hint for someone else. I would recommend to start to look in
/host/etc/cni/net.d/
and try to understand whats happening there.@pjohnst5 thanks for bearing with us. Finally collecting the correct logs and have the repo here:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-aks-engine-windows-containerd/1385422187535536128
kubelet logs:
Full Node logs: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/101357/pull-kubernetes-e2e-aks-engine-windows-containerd/1385422187535536128/artifacts/2726k8s001/
Azure cni logs around the time of the error:
looks like we are not collecting them properly, I will fix that and come back once it repos again. Thanks for taking a look!
Any chance you could include
azure-vnet.log*
in those logs? Saw that you made a fix to collect them.A new version would most likely not fix the issue, we have a hunch about where this error is coming from but a repro cluster or azure-vnet.log files will help us confirm that
We are using azure cni v1.2.2. We updated to this version for this fix: https://github.com/Azure/azure-container-networking/pull/755
/cc @abelhu