cilium: CiliumEndpoint missing for a pod
cilium version: 1.9.4 kubelet: 1.20
After upgrade of a cluster, we notice ciliumendpoint is missing for a running pod. We dig into it and think it may be a bug in the way cilium manages cilium endpoint. Here is the timeline:
- An existing statefulset pod alertmanager-0 running on a node
- cluster is going with an upgrade. both cilium-agent and alertmanager-0 are getting upgraded
- kubelet fails to remove the old container for alertmanager-0 because cilium-agent is being rebooted at the same time (so cni is missing)
Aug 03 04:08:11 bmut-dozrr9-0803-024754-871f3-acp2 kubelet[48398]: E0803 04:08:11.853703 48398 pod_workers.go:191] Error syncing pod 3f7c5201-edc9-4b83-a6a2-470289ae89ac ("alertmanager-0_kube-system(3f7c5201-edc9-4b83-a6a2-470289ae89ac)"), skipping: error killing pod: failed to "KillPodSandbox" for "3f7c5201-edc9-4b83-a6a2-470289ae89ac" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"a580c1ab31c3b1c66cbc5d542f9eb340bd0494449c2d832c1bda6e27636ed949\": failed to find plugin \"cilium-cni\" in path [/opt/cni/bin]"
- cilium-agent is rebooted and starts to restore existing endpoints
2021-08-03T04:08:34.378283278Z level=info msg="New endpoint" containerID=a580c1ab31 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=277 identity=60609 ipv4=192.168.1.165 ipv6= k8sPodName=kube-system/alertmanager-0 subsys=endpoint
- Almost at the same time, kubelet creates a new container for the pod:
2021-08-03T04:08:34.737428463Z level=info msg="Create endpoint request" addressing="&{192.168.1.85 78bda739-f410-11eb-92f1-42010a800063 }" containerID=f4a2e88f66aabf348196605c0b4d430bf6f3b4f660224f984bd16053b07b6b23 datapathConfiguration="<nil>" interface=lxc368fadda4150 k8sPodName=kube-system/alertmanager-0 labels="[]" subsys=daemon sync-build=true
Note that the container ID is not the same, so it’s kubelet creating a new container for the same pod
- after ~ 30seconds, kubelet tries to remove the old container of the pod
2021-08-03T04:09:00.533694827Z level=info msg="Delete endpoint request" id="container-id:a580c1ab31c3b1c66cbc5d542f9eb340bd0494449c2d832c1bda6e27636ed949" subsys=daemon
2021-08-03T04:09:00.534015683Z level=info msg="Releasing key" key="[k8s:app=alertmanager k8s:io.cilium.k8s.policy.cluster=default k8s:io.cilium.k8s.policy.serviceaccount=alertmanager k8s:io.kubernetes.pod.namespace=kube-system k8s:statefulset.kubernetes.io/pod-name=alertmanager-0]" subsys=allocator
2021-08-03T04:09:00.539698529Z level=info msg="Removed endpoint" containerID=a580c1ab31 datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=277 identity=60609 ipv4=192.168.1.165 ipv6= k8sPodName=kube-system/alertmanager-0 subsys=endpoint
Note the container id in the request is the old one.
- cilium-agent removes the ciliumendpoint but still the pod is running.
After all of these steps, we end up with alertmanager-0 running fine but doesn’t have a cilium endpoint. I guess the issue is that when we process delete endpoint rpc request, we didn’t check the container id matches the one in the cilium_endpoint so if kubelet is removing an stale container, ciliumendpoint for a pod is removed. Shall we add this check?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 33 (32 by maintainers)
@christarazi @Weil0ng We should only delete the CEP with the pre condition that its UID is the same when the CEP was created. Something like this:
The CEP object is only removed because it has the
ownerReferenceset to the backing pod, so that their lifecycles are tied together.Basically, the timeline of events for an endpoint-delete is:
kubectl delete pod app1CNI DELto CiliumCNI DELhas completedownerReferenceset to the now-deleted pod)@Weil0ng
I think that’s at the core of my confusion. I think Cilium must follow whatever Kubernetes deems “uniqueness” is. If statefulsets are the exception, then Cilium must account for that.
@liuyuan10 The above is why I’m not saying to just go right ahead re: adding a check for containerID in the delete. I think the solution needs to dig one level deeper.
Yes that’s correct.
Correct yeah it seems to me that the pod is back up by the time Cilium goes to validate the restored endpoint.
I think because the CEP already exists with the pod name, because CEPs are named after pod names.
According to christarazi , I think it’s when the k8s pod is removed, apiserver removes the CEP as well because threre is a owner ref to k8s pod.