kubernetes: kubelet doesn't restart Pods on the hostNetwork if CNI isn't initialized

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): HEAD

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others: kubeadm:

How to reproduce on Ubuntu for example:

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial-unstable main
EOF
apt-get update && apt-get install -y docker.io kubeadm
kubeadm init --kubernetes-version v1.6.0-beta.4

What happened:

When something in a static Pod changes that requires the pod infra container to restart, like this command:

sed -e "s|spec:|spec:\n  dnsPolicy: ClusterFirstWithHostNet|" -i /etc/kubernetes/manifests/kube-apiserver.yaml
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.577643   26734 kuberuntime_gc.go:138] Failed to stop sandbox "640d0649f8ad383a2438cd17ee6f8b5b2a847462461e5e59987537778d604220" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.580730   26734 remote_runtime.go:109] StopPodSandbox "6c70b0c81f7c751b0abad687ca35a140b9c834c56d01a56b99020eaa2673e206" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pdnfc_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.580745   26734 kuberuntime_gc.go:138] Failed to stop sandbox "6c70b0c81f7c751b0abad687ca35a140b9c834c56d01a56b99020eaa2673e206" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pdnfc_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.596422   26734 remote_runtime.go:109] StopPodSandbox "a4a478447bea9939cc538db24737c47e50ebfcb80188c570e392e7c6e5b42d14" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.596443   26734 kuberuntime_gc.go:138] Failed to stop sandbox "a4a478447bea9939cc538db24737c47e50ebfcb80188c570e392e7c6e5b42d14" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.597515   26734 remote_runtime.go:109] StopPodSandbox "a8c81b8a3de45c5ee4dabcc7d2dcde6a85d5df896f51d83f483705140f576729" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pwv8x_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.597535   26734 kuberuntime_gc.go:138] Failed to stop sandbox "a8c81b8a3de45c5ee4dabcc7d2dcde6a85d5df896f51d83f483705140f576729" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pwv8x_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.624209   26734 remote_runtime.go:109] StopPodSandbox "df9d016b47220b5d56bce5d71f8e607be0f0fee73d95a83f14247b0fddc1221f" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-vqvrq_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.624239   26734 kuberuntime_gc.go:138] Failed to stop sandbox "df9d016b47220b5d56bce5d71f8e607be0f0fee73d95a83f14247b0fddc1221f" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-vqvrq_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.626239   26734 remote_runtime.go:109] StopPodSandbox "e6be13d131e76024bb6633f113f44215e9ed375af332233b0fcdc02ec8a0da38" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-k3vj8_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.626271   26734 kuberuntime_gc.go:138] Failed to stop sandbox "e6be13d131e76024bb6633f113f44215e9ed375af332233b0fcdc02ec8a0da38" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-k3vj8_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.627350   26734 remote_runtime.go:109] StopPodSandbox "edc38912ecb02b7d06607d95f45a7158123a62c61c32ae1de2135c5c45646f22" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-hq9gg_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.627367   26734 kuberuntime_gc.go:138] Failed to stop sandbox "edc38912ecb02b7d06607d95f45a7158123a62c61c32ae1de2135c5c45646f22" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-hq9gg_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.631330   26734 remote_runtime.go:109] StopPodSandbox "f6a61129c8231043080728d7c13dcb32860c52d9b0ac2926f549614120170934" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.631341   26734 kuberuntime_gc.go:138] Failed to stop sandbox "f6a61129c8231043080728d7c13dcb32860c52d9b0ac2926f549614120170934" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized

What you expected to happen:

I expected the apiserver Pod to restart as usual (it works when disabling CRI)

How to reproduce it (as minimally and precisely as possible):

Described above

I think the change here will be pretty minimal, see: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L119

The problem is probably here somewhere: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L181

This is pretty serious, I would maybe consider it blocking for the v1.6 release. cc @kubernetes/release-team

@kubernetes/sig-network-misc @kubernetes/sig-node-bugs @yujuhong

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Seems like pods requiring CNI setup even if they’re hostNetwork is preventing the initialization of a new cluster, at least for me.

I just tested this with kubeadm+v1.6.0, and my experience was a bit different from what you described.

  1. kubeadm init properly started kubelet and all master components.
  2. All master components are running, but node/kubelet stays “NotReady”.
  3. kubeadm continues waiting for the node to become ready, and would not move on to the next phase to create kube-proxy/-dns addons.

For (2), the node would stay “not ready” until the network is properly configured to prevent the scheduler from assigning any pod to the node. I believe this is working as intended (introduced in #43474).

As for (3), if cni configuration would only be populated after the node becomes ready (assuming this is the case, correct me if I am wrong @luxas @jbeda), the setup process would be inherently flawed because of the circular dependency of (2) and (3).

@kensimon hostnetwork pods should run regardless of node being ready or not. If you don’t see all the master components running and ready (kubectl get pods --all-namespaces), could you run kubelet version to verify kubelet’s version?

@yujuhong PTAL, I agree this could be a blocker

Already looking. Trying to reproduce the issue.

dockershim should not be calling teardown for a pod with host network. I suspect PodSandboxStatus returning an error (for whatever reason), but need to confirm.

Can this be reopened? Regardless of whether kubeadm is doing the wrong thing, kubelet should not fail to delete a hostNetworking pod just because the CNI state is broken.

I ran into other issues while trying to reproduce this, but I never got the “cni config uninitialized” to show up…

The main issue @bowei and I ran into was that kubelet did not restart the static pod after we modified the manifest file. This was likely caused by inotify events not being processed properly (or been dropped). I’ve asked @Random-Liu to help look into the issue.

For static pods, any change to the pod spec should result in a completely new pod UID, i.e., it should be treated as a new pod. The creation of the new pod should not be blocked by the teardown of the old pod. This is strange.

Taking a look at the kubeadm repro