kubernetes: kubelet doesn't restart Pods on the hostNetwork if CNI isn't initialized
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Kubernetes version (use kubectl version
): HEAD
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others: kubeadm:
How to reproduce on Ubuntu for example:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial-unstable main
EOF
apt-get update && apt-get install -y docker.io kubeadm
kubeadm init --kubernetes-version v1.6.0-beta.4
What happened:
When something in a static Pod changes that requires the pod infra container to restart, like this command:
sed -e "s|spec:|spec:\n dnsPolicy: ClusterFirstWithHostNet|" -i /etc/kubernetes/manifests/kube-apiserver.yaml
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.577643 26734 kuberuntime_gc.go:138] Failed to stop sandbox "640d0649f8ad383a2438cd17ee6f8b5b2a847462461e5e59987537778d604220" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.580730 26734 remote_runtime.go:109] StopPodSandbox "6c70b0c81f7c751b0abad687ca35a140b9c834c56d01a56b99020eaa2673e206" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pdnfc_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.580745 26734 kuberuntime_gc.go:138] Failed to stop sandbox "6c70b0c81f7c751b0abad687ca35a140b9c834c56d01a56b99020eaa2673e206" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pdnfc_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.596422 26734 remote_runtime.go:109] StopPodSandbox "a4a478447bea9939cc538db24737c47e50ebfcb80188c570e392e7c6e5b42d14" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.596443 26734 kuberuntime_gc.go:138] Failed to stop sandbox "a4a478447bea9939cc538db24737c47e50ebfcb80188c570e392e7c6e5b42d14" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.597515 26734 remote_runtime.go:109] StopPodSandbox "a8c81b8a3de45c5ee4dabcc7d2dcde6a85d5df896f51d83f483705140f576729" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pwv8x_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.597535 26734 kuberuntime_gc.go:138] Failed to stop sandbox "a8c81b8a3de45c5ee4dabcc7d2dcde6a85d5df896f51d83f483705140f576729" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-pwv8x_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.624209 26734 remote_runtime.go:109] StopPodSandbox "df9d016b47220b5d56bce5d71f8e607be0f0fee73d95a83f14247b0fddc1221f" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-vqvrq_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.624239 26734 kuberuntime_gc.go:138] Failed to stop sandbox "df9d016b47220b5d56bce5d71f8e607be0f0fee73d95a83f14247b0fddc1221f" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-vqvrq_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.626239 26734 remote_runtime.go:109] StopPodSandbox "e6be13d131e76024bb6633f113f44215e9ed375af332233b0fcdc02ec8a0da38" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-k3vj8_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.626271 26734 kuberuntime_gc.go:138] Failed to stop sandbox "e6be13d131e76024bb6633f113f44215e9ed375af332233b0fcdc02ec8a0da38" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-k3vj8_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.627350 26734 remote_runtime.go:109] StopPodSandbox "edc38912ecb02b7d06607d95f45a7158123a62c61c32ae1de2135c5c45646f22" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-hq9gg_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.627367 26734 kuberuntime_gc.go:138] Failed to stop sandbox "edc38912ecb02b7d06607d95f45a7158123a62c61c32ae1de2135c5c45646f22" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "self-hosted-kube-apiserver-hq9gg_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.631330 26734 remote_runtime.go:109] StopPodSandbox "f6a61129c8231043080728d7c13dcb32860c52d9b0ac2926f549614120170934" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
Mar 20 17:40:49 rook-test kubelet-patched[26734]: E0320 17:40:49.631341 26734 kuberuntime_gc.go:138] Failed to stop sandbox "f6a61129c8231043080728d7c13dcb32860c52d9b0ac2926f549614120170934" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-apiserver-rook-test_kube-system" network: cni config uninitialized
What you expected to happen:
I expected the apiserver Pod to restart as usual (it works when disabling CRI)
How to reproduce it (as minimally and precisely as possible):
Described above
I think the change here will be pretty minimal, see: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L119
The problem is probably here somewhere: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L181
This is pretty serious, I would maybe consider it blocking for the v1.6 release. cc @kubernetes/release-team
@kubernetes/sig-network-misc @kubernetes/sig-node-bugs @yujuhong
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 17 (13 by maintainers)
I just tested this with kubeadm+v1.6.0, and my experience was a bit different from what you described.
kubeadm init
properly started kubelet and all master components.kubeadm
continues waiting for the node to become ready, and would not move on to the next phase to create kube-proxy/-dns addons.For (2), the node would stay “not ready” until the network is properly configured to prevent the scheduler from assigning any pod to the node. I believe this is working as intended (introduced in #43474).
As for (3), if cni configuration would only be populated after the node becomes ready (assuming this is the case, correct me if I am wrong @luxas @jbeda), the setup process would be inherently flawed because of the circular dependency of (2) and (3).
@kensimon hostnetwork pods should run regardless of node being ready or not. If you don’t see all the master components running and ready (
kubectl get pods --all-namespaces
), could you runkubelet version
to verify kubelet’s version?Already looking. Trying to reproduce the issue.
dockershim should not be calling teardown for a pod with host network. I suspect
PodSandboxStatus
returning an error (for whatever reason), but need to confirm.Can this be reopened? Regardless of whether kubeadm is doing the wrong thing, kubelet should not fail to delete a
hostNetworking
pod just because the CNI state is broken.I ran into other issues while trying to reproduce this, but I never got the “cni config uninitialized” to show up…
The main issue @bowei and I ran into was that kubelet did not restart the static pod after we modified the manifest file. This was likely caused by inotify events not being processed properly (or been dropped). I’ve asked @Random-Liu to help look into the issue.
For static pods, any change to the pod spec should result in a completely new pod UID, i.e., it should be treated as a new pod. The creation of the new pod should not be blocked by the teardown of the old pod. This is strange.
Taking a look at the kubeadm repro