kubernetes: Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI

Kubernetes version (use kubectl version): 1.6.2 built from 1.6 branch at 85b9c963272248c0b88d06664e88eb3ef1645dfb

Environment:

  • Cloud provider or hardware configuration: bare metal
  • OS (e.g. from /etc/os-release): Ubuntu xenial
  • Kernel (e.g. uname -a): Linux fab7-compute-1 4.4.0-70-generic #91-Ubuntu SMP Wed Mar 22 12:47:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Others:

What happened: When using CNI, kubelet gets confused when pods have hostNetwork=true. Kubelet displays errors such as:

StopPodSandbox "3fa9fc1247bed768521abe54f730a71a7412f17e8026975902e99f56ea4fd73c" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-proxy-cts6b_kube-system" network: CNI failed to retrieve network namespace path: Error: No such container: 3fa9fc1247bed768521abe54f730a71a7412f17e8026975902e99f56ea4fd73c

Deleting these pods will result in the pod hung in the Terminating state.

It appears that the tear down does not notice that CNI should not be invoked when hostNetwork=true but attempts to do so anyway.

How to reproduce it (as minimally and precisely as possible): Create a pod, daemonset, or deployment with hostNetwork set to true in pod spec.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 3
  • Comments: 30 (25 by maintainers)

Commits related to this issue

Most upvoted comments

True. I am just saying that maintaining the desired state does not solve the whole problem. kubelet still triggers multiple DELs in some cases. The cni plugin has to return Success if there is nothing to delete. Otherwise, kubelet cannot proceed.

I’m of the opinion that kubelet should do the right thing and not call DEL when either (a) DEL is already in progress, or (b) DEL has already been called. Kubelet has to be responsible for it’s own operations, it can’t punt this off to network plugins that really have nothing to do with kubelet.

something has to keep state. Either that’s kubelet, through the apiserver or the checkpoints, or it’s the network plugins. And not all network plugins are smart. When we spin kubenet out to a CNI chain, what’s going to keep state there, if network plugins need to keep state? Nothing… DEL should be best-effort, but we gotta make kubelet better behaved here too…

tested using 1.6.4 still seeing this issue…

This is an issue with all released versions of 1.6; I don’t understand why you would consider it a blocker to release 1.7.0.