calico: calico/node token is invalidated by Kubernetes when the pod is evicted, leading to CNI failures

Expected Behavior

Calico CNI plugin tears down Pod in a timely manner.

Current Behavior

Calico CNI plugin shows errors terminating Pods, and therefore eviction takes too long. Especially relevant in Kubernetes conformance testing.

Aug 18 18:19:04.521: INFO: At 2021-08-18 18:18:01 +0000 UTC - event for taint-eviction-a1: {kubelet ip-10-0-8-52} FailedKillPod: error killing pod: failed to "KillPodSandbox" for "0701ef9b-e
17d-43b5-a48f-89fa3ac00999" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"taint-eviction-a1_taint-multiple-pods-4011\" network: error
 getting ClusterInformation: connection is unauthorized: Unauthorized"

The natural things to check are RBAC permissions, which match recommendations:

- apiGroups:
  - crd.projectcalico.org
  resources:
  - globalfelixconfigs
  - felixconfigurations
  - bgppeers
  - globalbgpconfigs
  - bgpconfigurations
  - ippools
  - ipamblocks
  - globalnetworkpolicies
  - globalnetworksets
  - networkpolicies
  - networksets
  - clusterinformations
  - hostendpoints
  - blockaffinities
  verbs:
  - get
  - list
  - watch
...

To be certain, we can use the actual kubeconfig Calico writes to the host’s /etc/cni/net.d. It does indeed seem to have permission to get clusterinformations. The error above is unusual.

./kubectl --kubeconfig /etc/cni/net.d/calico-kubeconfig auth can-i get clusterinformations --all-namespaces
yes

Steps to Reproduce (for bugs)

sonobuoy run --e2e-focus="NoExecuteTaintManager Multiple Pods" --e2e-skip="" \
--plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/controller"

Context

This issue affects Kubernetes Conformance tests:

Summarizing 1 Failure:

[Fail] [sig-node] NoExecuteTaintManager Multiple Pods [Serial] [It] evicts pods with minTolerationSeconds [Disruptive] [Conformance] 
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:113

The test in question creates two Pods that don’t tolerate a taint, and expects them to be terminated within certain times. In Kubelet logs, the Calico CNI plugin is complaining with the logs above and termination takes too long.

Your Environment

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (27 by maintainers)

Most upvoted comments

@caseydavenport We have run into this issue in one of our clusters in a slightly different scenario. For bin packing reasons, we scale the resource requests of calico-node vertically in a cluster-proportional manner (pretty similar to https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/calico-policy-controller). As the cluster grew in size, calico-node was supposed to be recreated with bigger memory requests, i.e. the existing pod was deleted and a new one with higher memory requests being created. However, as the node was nearly fully loaded there was not enough space for the new pod. Thanks to the priority class of calico-node pre-emption occurred and the kube-scheduler tried to get rid of a lower priority pod on the node. However, now we ran into the problem that the lower priority pod could not be deleted as network sandbox deletion via CNI fails with this error (error getting ClusterInformation: connection is unauthorized: Unauthorized") as the token in calico’s kubeconfig belongs to a deleted pod. The node cannot automatically recover from this as no pod can be completely removed due to the CNI error and calico-node cannot be scheduled as the memory requirements are not fulfilled.

Is there a plan to resolve this issue by for example using the token api directly or otherwise decoupling the validity of the token used for CNI from the calico-node pod lifecycle?

@caseydavenport Feel free to take a look at #5910 if you have some time to spare. I will be on vacation next week, though. Hence, there is no hurry from my side.

To pass Kubernetes v1.22 conformance testing, Typhoon used flannel instead of Calico. I’ll re-run with Calico during the v1.23 cycle.

When 10-calico.conflist is written out, __KUBECONFIG_FILEPATH__ is replaced with /etc/cni/net.d/calico-kubeconfig. Within the calico-node container, the location of the mounted file is /host/cni/net.d/calico-kubeconfig. Maybe that’s not what calico-node wants, but it’s been this way a while.

That seems to match what Calico’s release calico.yaml shows here and Calico reports no troubles finding a kubeconfig either.

Setting calico-node’s env didn’t seem to alter the result.

- name: CALICO_MANAGE_CNI                                                                                                                                                                 
    value: "true"

Thanks for looking into this folks!

I know CNI’s docs often show an “allow everywhere” toleration (i.e. operator: Exists without a key). However, we can’t ship that. Clusters support many platforms (clouds, bare-metal) and support heterogeneous nodes with different properties (e.g. worker pools with different OSes, Arch, resources, hardware, etc).

Choosing on behalf of users that a Calico DaemonSet should be on ALL nodes would limit use cases. For example,

  • Clusters with x86 and arm64 nodes, Calico does not ship a typical multi-arch container image (it ships an image per architecture, which is different and requires DaemonSets matching a subset of nodes)
  • Clusters that use blends of CNI providers on different pools of workers
  • Certain cases where you don’t want a CNI provider on a set of nodes at all
tolerations:
        # Make sure calico-node gets scheduled on all nodes.   <- Good for simple clusters (90% use case)
        - effect: NoSchedule
          operator: Exists
        # Mark the pod as a critical add-on for rescheduling.      <- Deprecated
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute                                                          <- Good for simple clusters
          operator: Exists

Instead, Typhoon allows kube-system DaemonSet tolerations to be configured, to support those more advanced cases. Here’s one example (though Typhoon doesn’t support ARM64 if Calico is chosen).

 tolerations:
      - key: node-role.kubernetes.io/controller
        operator: Exists
      - key: node.kubernetes.io/not-ready
        operator: Exists
      %{~ for key in daemonset_tolerations ~}
      - key: ${key}
        operator: Exists
      %{~ endfor ~}

From your investigation, it sounds like having this conformance test pass will require listing what those expected taints are, and provisioning the cluster so that Calico tolerates them. I suppose the reason Cilium and flannel don’t hit this is because they’re not relying on credentials in the same way.