kubernetes: Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI
Kubernetes version (use kubectl version
):
1.6.2 built from 1.6 branch at 85b9c963272248c0b88d06664e88eb3ef1645dfb
Environment:
- Cloud provider or hardware configuration: bare metal
- OS (e.g. from /etc/os-release): Ubuntu xenial
- Kernel (e.g.
uname -a
): Linux fab7-compute-1 4.4.0-70-generic #91-Ubuntu SMP Wed Mar 22 12:47:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubeadm
- Others:
What happened: When using CNI, kubelet gets confused when pods have hostNetwork=true. Kubelet displays errors such as:
StopPodSandbox "3fa9fc1247bed768521abe54f730a71a7412f17e8026975902e99f56ea4fd73c" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-proxy-cts6b_kube-system" network: CNI failed to retrieve network namespace path: Error: No such container: 3fa9fc1247bed768521abe54f730a71a7412f17e8026975902e99f56ea4fd73c
Deleting these pods will result in the pod hung in the Terminating state.
It appears that the tear down does not notice that CNI should not be invoked when hostNetwork=true but attempts to do so anyway.
How to reproduce it (as minimally and precisely as possible): Create a pod, daemonset, or deployment with hostNetwork set to true in pod spec.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 3
- Comments: 30 (25 by maintainers)
Commits related to this issue
- Merge pull request #46823 from dcbw/fix-up-runtime-GetNetNS2 Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276) kubelet/network: report but tolerate errors r... — committed to kubernetes/kubernetes by deleted user 7 years ago
- dockershim: checkpoint HostNetwork property To ensure kubelet doesn't attempt network teardown on HostNetwork containers that no longer exist but are still checkpointed, make sure we preserve the Hos... — committed to dcbw/kubernetes by dcbw 7 years ago
- Merge pull request #47850 from dcbw/checkpoint-hostnetwork Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284) dockershim: checkpoint HostNetwork property To... — committed to jcbsmpsn/kubernetes by deleted user 7 years ago
- dockershim: checkpoint HostNetwork property To ensure kubelet doesn't attempt network teardown on HostNetwork containers that no longer exist but are still checkpointed, make sure we preserve the Hos... — committed to rajatchopra/origin by dcbw 7 years ago
- UPSTREAM: drop: fix for bz1507257 hacked from upstream PR47850, drop these changes in favour of that PR because this one does not carry the entire dependent chain. Conflicts were removed manually. do... — committed to rajatchopra/origin by dcbw 7 years ago
- Merge pull request #17097 from rajatchopra/dockershim Automatic merge from submit-queue. UPSTREAM: 47850: fix for bz1507257 hacked from upstream PR47850 'drop' these changes in favour of upstream P... — committed to openshift/origin by openshift-merge-robot 7 years ago
I’m of the opinion that kubelet should do the right thing and not call DEL when either (a) DEL is already in progress, or (b) DEL has already been called. Kubelet has to be responsible for it’s own operations, it can’t punt this off to network plugins that really have nothing to do with kubelet.
something has to keep state. Either that’s kubelet, through the apiserver or the checkpoints, or it’s the network plugins. And not all network plugins are smart. When we spin kubenet out to a CNI chain, what’s going to keep state there, if network plugins need to keep state? Nothing… DEL should be best-effort, but we gotta make kubelet better behaved here too…
tested using 1.6.4 still seeing this issue…
This is an issue with all released versions of 1.6; I don’t understand why you would consider it a blocker to release 1.7.0.