kubernetes: GCE/Windows: error while ProvisionEndpoint(): Element not found
On GCE, we have been seeing occasional flakes caused by failing the create the podsandbox. The error message usually shows “Element not found”
Example:
Mar 15 22:21:25.309: INFO: At 2019-03-15 22:16:14 +0000 UTC - event for pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3: {default-scheduler } Scheduled: Successfully assigned configmap-4540/pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3 to e2e-72-a4c99-windows-node-group-r3pv
Mar 15 22:21:25.310: INFO: At 2019-03-15 22:16:35 +0000 UTC - event for pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3: {kubelet e2e-72-a4c99-windows-node-group-r3pv} FailedCreatePodSandBox: Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4" network for pod "pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3": NetworkPlugin cni failed to set up pod "pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3_configmap-4540" network: error while ProvisionEndpoint(09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4_l2bridge,990998DF-0CAD-4F52-B347-1961D26C3F85,09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4): Element not found., failed to clean up sandbox container "09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4" network for pod "pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3": NetworkPlugin cni failed to teardown pod "pod-configmaps-f27c8d25-476f-11e9-9871-a605168c5ad3_configmap-4540" network: failed to find HNSEndpoint 09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4_l2bridge: Endpoint 09370920b7a896b81560cbedc127a629621c5cdb5d53e1c504af4a181d8869a4_l2bridge not found]
I believe we are using a patched win-bridge CNI plugin from https://github.com/nagiesek/plugins/tree/k8sDnsFix/plugins/main/windows/win-bridge @pjh please correct me if I’m wrong.
- OS: Windows 1809
- Docker: 18.09.2
This is the cause of the majority of flakes in our test job.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 33 (32 by maintainers)
Thanks a lot @pjh I’ll take a look later and update
Sorry for the delay, I expect I can start running our clusters with the suggested PR (https://github.com/containernetworking/plugins/pull/286) this evening.