calico: calico/node token is invalidated by Kubernetes when the pod is evicted, leading to CNI failures
Expected Behavior
Calico CNI plugin tears down Pod in a timely manner.
Current Behavior
Calico CNI plugin shows errors terminating Pods, and therefore eviction takes too long. Especially relevant in Kubernetes conformance testing.
Aug 18 18:19:04.521: INFO: At 2021-08-18 18:18:01 +0000 UTC - event for taint-eviction-a1: {kubelet ip-10-0-8-52} FailedKillPod: error killing pod: failed to "KillPodSandbox" for "0701ef9b-e
17d-43b5-a48f-89fa3ac00999" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"taint-eviction-a1_taint-multiple-pods-4011\" network: error
getting ClusterInformation: connection is unauthorized: Unauthorized"
The natural things to check are RBAC permissions, which match recommendations:
- apiGroups:
- crd.projectcalico.org
resources:
- globalfelixconfigs
- felixconfigurations
- bgppeers
- globalbgpconfigs
- bgpconfigurations
- ippools
- ipamblocks
- globalnetworkpolicies
- globalnetworksets
- networkpolicies
- networksets
- clusterinformations
- hostendpoints
- blockaffinities
verbs:
- get
- list
- watch
...
To be certain, we can use the actual kubeconfig Calico writes to the host’s /etc/cni/net.d. It does indeed seem to have permission to get clusterinformations. The error above is unusual.
./kubectl --kubeconfig /etc/cni/net.d/calico-kubeconfig auth can-i get clusterinformations --all-namespaces
yes
Steps to Reproduce (for bugs)
sonobuoy run --e2e-focus="NoExecuteTaintManager Multiple Pods" --e2e-skip="" \
--plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/controller"
Context
This issue affects Kubernetes Conformance tests:
Summarizing 1 Failure:
[Fail] [sig-node] NoExecuteTaintManager Multiple Pods [Serial] [It] evicts pods with minTolerationSeconds [Disruptive] [Conformance]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:113
The test in question creates two Pods that don’t tolerate a taint, and expects them to be terminated within certain times. In Kubelet logs, the Calico CNI plugin is complaining with the logs above and termination takes too long.
Your Environment
- Calico version: v3.19.2 or v3.20.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.22.0
- Operating System and version: Fedora CoreOS
- Link to your project (optional): https://github.com/poseidon/typhoon, Calico manifests https://github.com/poseidon/terraform-render-bootstrap/tree/master/resources/calico
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 30 (27 by maintainers)
@caseydavenport We have run into this issue in one of our clusters in a slightly different scenario. For bin packing reasons, we scale the resource requests of
calico-nodevertically in a cluster-proportional manner (pretty similar to https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/calico-policy-controller). As the cluster grew in size,calico-nodewas supposed to be recreated with bigger memory requests, i.e. the existing pod was deleted and a new one with higher memory requests being created. However, as the node was nearly fully loaded there was not enough space for the new pod. Thanks to the priority class ofcalico-nodepre-emption occurred and thekube-schedulertried to get rid of a lower priority pod on the node. However, now we ran into the problem that the lower priority pod could not be deleted as network sandbox deletion via CNI fails with this error (error getting ClusterInformation: connection is unauthorized: Unauthorized") as the token in calico’s kubeconfig belongs to a deleted pod. The node cannot automatically recover from this as no pod can be completely removed due to the CNI error andcalico-nodecannot be scheduled as the memory requirements are not fulfilled.Is there a plan to resolve this issue by for example using the token api directly or otherwise decoupling the validity of the token used for CNI from the
calico-nodepod lifecycle?@caseydavenport Feel free to take a look at #5910 if you have some time to spare. I will be on vacation next week, though. Hence, there is no hurry from my side.
To pass Kubernetes v1.22 conformance testing, Typhoon used flannel instead of Calico. I’ll re-run with Calico during the v1.23 cycle.
When
10-calico.conflistis written out,__KUBECONFIG_FILEPATH__is replaced with/etc/cni/net.d/calico-kubeconfig. Within thecalico-nodecontainer, the location of the mounted file is/host/cni/net.d/calico-kubeconfig. Maybe that’s not whatcalico-nodewants, but it’s been this way a while.That seems to match what Calico’s release
calico.yamlshows here and Calico reports no troubles finding a kubeconfig either.Setting
calico-node’senvdidn’t seem to alter the result.Thanks for looking into this folks!
I know CNI’s docs often show an “allow everywhere” toleration (i.e.
operator: Existswithout a key). However, we can’t ship that. Clusters support many platforms (clouds, bare-metal) and support heterogeneous nodes with different properties (e.g. worker pools with different OSes, Arch, resources, hardware, etc).Choosing on behalf of users that a Calico DaemonSet should be on ALL nodes would limit use cases. For example,
Instead, Typhoon allows
kube-systemDaemonSet tolerations to be configured, to support those more advanced cases. Here’s one example (though Typhoon doesn’t support ARM64 if Calico is chosen).From your investigation, it sounds like having this conformance test pass will require listing what those expected taints are, and provisioning the cluster so that Calico tolerates them. I suppose the reason Cilium and flannel don’t hit this is because they’re not relying on credentials in the same way.