kubernetes: daemonset stuck in `Completed` state after a reboot (with graceful kubelet shutdown)

What happened?

I can reproduce this reliably with Calico v3.25.0 CNI, installed straight from the upstream manifest.

It seems to be a regression in Kubernetes 1.27.0-rc.0 compared to 1.27.0-beta.0, at least I can’t reproduce it with beta.0.

Cluster is deployed, and everything is ready & healthy. After a series of node reboots some calico-node pods are stuck in Completed state and never recover:

NAMESPACE       NAME                                                   READY   STATUS      RESTARTS        AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
kube-system     calico-kube-controllers-6c99c8747f-q5g8d               1/1     Running     2 (14m ago)     26m     192.168.62.135   talos-default-controlplane-2   <none>           <none>
kube-system     calico-node-hm7pf                                      1/1     Running     2 (12m ago)     25m     172.20.0.2       talos-default-controlplane-1   <none>           <none>
kube-system     calico-node-hvws2                                      1/1     Running     2 (11m ago)     25m     172.20.0.4       talos-default-controlplane-3   <none>           <none>
kube-system     calico-node-kf69z                                      0/1     Completed   1               25m     172.20.0.6       talos-default-worker-2         <none>           <none>
kube-system     calico-node-lpkxj                                      0/1     Completed   1 (18m ago)     25m     172.20.0.5       talos-default-worker-1         <none>           <none>
kube-system     calico-node-pngj7                                      1/1     Running     2 (14m ago)     26m     172.20.0.3       talos-default-controlplane-2   <none>           <none>

This of course disrupts the CNI and affects the scheduled workloads.

The daemonset is “happy” with 2 pods being out:

NAMESPACE     NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS    IMAGES                                    SELECTOR
kube-system   calico-node   5         5         3       5            3           kubernetes.io/os=linux   45m   calico-node   docker.io/calico/node:v3.25.0             k8s-app=calico-node

If I delete the Completed pod manually, the pod recovers without any issues.

What did you expect to happen?

The pods are restarted.

How can we reproduce it (as minimally and precisely as possible)?

Bring up a cluster with graceful node shutdown, reboot nodes a few times (including multiple node at anoce).

Anything else we need to know?

Additional detailed output:

$ kubectl get pods -n kube-system calico-node-kf69z -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-03-30T17:46:42Z"
  generateName: calico-node-
  labels:
    controller-revision-hash: 6c6c65fb6c
    k8s-app: calico-node
    pod-template-generation: "1"
  name: calico-node-kf69z
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: calico-node
    uid: eff56b3d-c824-46f0-b922-26ccb2b94aaa
  resourceVersion: "3335"
  uid: c3624a67-8f41-4449-8024-5a33f7f69f30
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - talos-default-worker-2
  containers:
  - env:
    - name: DATASTORE_TYPE
      value: kubernetes
    - name: WAIT_FOR_DATASTORE
      value: "true"
    - name: NODENAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: CALICO_NETWORKING_BACKEND
      valueFrom:
        configMapKeyRef:
          key: calico_backend
          name: calico-config
    - name: CLUSTER_TYPE
      value: k8s,bgp
    - name: IP
      value: autodetect
    - name: CALICO_IPV4POOL_IPIP
      value: Always
    - name: CALICO_IPV4POOL_VXLAN
      value: Never
    - name: CALICO_IPV6POOL_VXLAN
      value: Never
    - name: FELIX_IPINIPMTU
      valueFrom:
        configMapKeyRef:
          key: veth_mtu
          name: calico-config
    - name: FELIX_VXLANMTU
      valueFrom:
        configMapKeyRef:
          key: veth_mtu
          name: calico-config
    - name: FELIX_WIREGUARDMTU
      valueFrom:
        configMapKeyRef:
          key: veth_mtu
          name: calico-config
    - name: CALICO_DISABLE_FILE_LOGGING
      value: "true"
    - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
      value: ACCEPT
    - name: FELIX_IPV6SUPPORT
      value: "false"
    - name: FELIX_HEALTHENABLED
      value: "true"
    envFrom:
    - configMapRef:
        name: kubernetes-services-endpoint
        optional: true
    image: docker.io/calico/node:v3.25.0
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/calico-node
          - -shutdown
    livenessProbe:
      exec:
        command:
        - /bin/calico-node
        - -felix-live
        - -bird-live
      failureThreshold: 6
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    name: calico-node
    readinessProbe:
      exec:
        command:
        - /bin/calico-node
        - -felix-ready
        - -bird-ready
      failureThreshold: 3
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    resources:
      requests:
        cpu: 250m
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/etc/cni/net.d
      name: cni-net-dir
    - mountPath: /lib/modules
      name: lib-modules
      readOnly: true
    - mountPath: /run/xtables.lock
      name: xtables-lock
    - mountPath: /var/run/calico
      name: var-run-calico
    - mountPath: /var/lib/calico
      name: var-lib-calico
    - mountPath: /var/run/nodeagent
      name: policysync
    - mountPath: /sys/fs/bpf
      name: bpffs
    - mountPath: /var/log/calico/cni
      name: cni-log-dir
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9r9p4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  initContainers:
  - command:
    - /opt/cni/bin/calico-ipam
    - -upgrade
    env:
    - name: KUBERNETES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: CALICO_NETWORKING_BACKEND
      valueFrom:
        configMapKeyRef:
          key: calico_backend
          name: calico-config
    envFrom:
    - configMapRef:
        name: kubernetes-services-endpoint
        optional: true
    image: docker.io/calico/cni:v3.25.0
    imagePullPolicy: IfNotPresent
    name: upgrade-ipam
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/cni/networks
      name: host-local-net-dir
    - mountPath: /host/opt/cni/bin
      name: cni-bin-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9r9p4
      readOnly: true
  - command:
    - /opt/cni/bin/install
    env:
    - name: CNI_CONF_NAME
      value: 10-calico.conflist
    - name: CNI_NETWORK_CONFIG
      valueFrom:
        configMapKeyRef:
          key: cni_network_config
          name: calico-config
    - name: KUBERNETES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: CNI_MTU
      valueFrom:
        configMapKeyRef:
          key: veth_mtu
          name: calico-config
    - name: SLEEP
      value: "false"
    envFrom:
    - configMapRef:
        name: kubernetes-services-endpoint
        optional: true
    image: docker.io/calico/cni:v3.25.0
    imagePullPolicy: IfNotPresent
    name: install-cni
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/opt/cni/bin
      name: cni-bin-dir
    - mountPath: /host/etc/cni/net.d
      name: cni-net-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9r9p4
      readOnly: true
  - command:
    - calico-node
    - -init
    - -best-effort
    image: docker.io/calico/node:v3.25.0
    imagePullPolicy: IfNotPresent
    name: mount-bpffs
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /sys/fs
      mountPropagation: Bidirectional
      name: sys-fs
    - mountPath: /var/run/calico
      mountPropagation: Bidirectional
      name: var-run-calico
    - mountPath: /nodeproc
      name: nodeproc
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9r9p4
      readOnly: true
  nodeName: talos-default-worker-2
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: calico-node
  serviceAccountName: calico-node
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoSchedule
    operator: Exists
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /lib/modules
      type: ""
    name: lib-modules
  - hostPath:
      path: /var/run/calico
      type: ""
    name: var-run-calico
  - hostPath:
      path: /var/lib/calico
      type: ""
    name: var-lib-calico
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: xtables-lock
  - hostPath:
      path: /sys/fs/
      type: DirectoryOrCreate
    name: sys-fs
  - hostPath:
      path: /sys/fs/bpf
      type: Directory
    name: bpffs
  - hostPath:
      path: /proc
      type: ""
    name: nodeproc
  - hostPath:
      path: /opt/cni/bin
      type: ""
    name: cni-bin-dir
  - hostPath:
      path: /etc/cni/net.d
      type: ""
    name: cni-net-dir
  - hostPath:
      path: /var/log/calico/cni
      type: ""
    name: cni-log-dir
  - hostPath:
      path: /var/lib/cni/networks
      type: ""
    name: host-local-net-dir
  - hostPath:
      path: /var/run/nodeagent
      type: DirectoryOrCreate
    name: policysync
  - name: kube-api-access-9r9p4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-03-30T17:56:51Z"
    message: Pod was terminated in response to imminent node shutdown.
    reason: TerminationByKubelet
    status: "True"
    type: DisruptionTarget
  - lastProbeTime: null
    lastTransitionTime: "2023-03-30T17:55:25Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-03-30T17:56:50Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-03-30T17:56:50Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-03-30T17:46:42Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
    image: docker.io/calico/node:v3.25.0
    imageID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
    lastState: {}
    name: calico-node
    ready: false
    restartCount: 1
    started: false
    state:
      terminated:
        containerID: containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
        exitCode: 0
        finishedAt: "2023-03-30T17:56:24Z"
        reason: Completed
        startedAt: "2023-03-30T17:55:26Z"
  hostIP: 172.20.0.6
  initContainerStatuses:
  - containerID: containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
    image: docker.io/calico/cni:v3.25.0
    imageID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
    lastState: {}
    name: upgrade-ipam
    ready: true
    restartCount: 2
    state:
      terminated:
        containerID: containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
        exitCode: 0
        finishedAt: "2023-03-30T17:56:50Z"
        reason: Completed
        startedAt: "2023-03-30T17:56:50Z"
  - containerID: containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
    image: docker.io/calico/cni:v3.25.0
    imageID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
    lastState: {}
    name: install-cni
    ready: true
    restartCount: 1
    state:
      terminated:
        containerID: containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
        exitCode: 0
        finishedAt: "2023-03-30T17:55:25Z"
        reason: Completed
        startedAt: "2023-03-30T17:55:24Z"
  - containerID: containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
    image: docker.io/calico/node:v3.25.0
    imageID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
    lastState: {}
    name: mount-bpffs
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
        exitCode: 0
        finishedAt: "2023-03-30T17:55:25Z"
        reason: Completed
        startedAt: "2023-03-30T17:55:25Z"
  phase: Succeeded
  podIP: 172.20.0.6
  podIPs:
  - ip: 172.20.0.6
  qosClass: Burstable
  startTime: "2023-03-30T17:47:26Z"
$ kubectl -n kube-system describe pod calico-node-kf69z
Name:                 calico-node-kf69z
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 talos-default-worker-2/172.20.0.6
Start Time:           Thu, 30 Mar 2023 21:47:26 +0400
Labels:               controller-revision-hash=6c6c65fb6c
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Succeeded
IP:                   172.20.0.6
IPs:
  IP:           172.20.0.6
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
    Image:         docker.io/calico/cni:v3.25.0
    Image ID:      docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 30 Mar 2023 21:56:50 +0400
      Finished:     Thu, 30 Mar 2023 21:56:50 +0400
    Ready:          True
    Restart Count:  2
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
  install-cni:
    Container ID:  containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
    Image:         docker.io/calico/cni:v3.25.0
    Image ID:      docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 30 Mar 2023 21:55:24 +0400
      Finished:     Thu, 30 Mar 2023 21:55:25 +0400
    Ready:          True
    Restart Count:  1
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
  mount-bpffs:
    Container ID:  containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
    Image:         docker.io/calico/node:v3.25.0
    Image ID:      docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
    Port:          <none>
    Host Port:     <none>
    Command:
      calico-node
      -init
      -best-effort
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 30 Mar 2023 21:55:25 +0400
      Finished:     Thu, 30 Mar 2023 21:55:25 +0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /nodeproc from nodeproc (ro)
      /sys/fs from sys-fs (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
Containers:
  calico-node:
    Container ID:   containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
    Image:          docker.io/calico/node:v3.25.0
    Image ID:       docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 30 Mar 2023 21:55:26 +0400
      Finished:     Thu, 30 Mar 2023 21:56:24 +0400
    Ready:          False
    Restart Count:  1
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      CALICO_IPV6POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_HEALTHENABLED:                true
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/bpf from bpffs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sys-fs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  bpffs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/bpf
    HostPathType:  Directory
  nodeproc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  kube-api-access-9r9p4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       17m                    default-scheduler  Successfully assigned kube-system/calico-node-kf69z to talos-default-worker-2
  Normal   Pulling         16m                    kubelet            Pulling image "docker.io/calico/cni:v3.25.0"
  Normal   Pulled          16m                    kubelet            Successfully pulled image "docker.io/calico/cni:v3.25.0" in 2.574137942s (2.574164652s including waiting)
  Normal   Created         16m                    kubelet            Created container upgrade-ipam
  Normal   Started         16m                    kubelet            Started container upgrade-ipam
  Normal   Pulled          16m                    kubelet            Container image "docker.io/calico/cni:v3.25.0" already present on machine
  Normal   Created         16m                    kubelet            Created container install-cni
  Normal   Started         16m                    kubelet            Started container install-cni
  Normal   Pulling         16m                    kubelet            Pulling image "docker.io/calico/node:v3.25.0"
  Normal   Pulled          16m                    kubelet            Successfully pulled image "docker.io/calico/node:v3.25.0" in 2.909875156s (2.909886566s including waiting)
  Normal   Created         16m                    kubelet            Created container mount-bpffs
  Normal   Started         16m                    kubelet            Started container mount-bpffs
  Normal   Pulled          16m                    kubelet            Container image "docker.io/calico/node:v3.25.0" already present on machine
  Normal   Created         16m                    kubelet            Created container calico-node
  Normal   Started         16m                    kubelet            Started container calico-node
  Warning  Unhealthy       16m (x2 over 16m)      kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
  Warning  Unhealthy       16m                    kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Normal   Killing         10m                    kubelet            Stopping container calico-node
  Normal   SandboxChanged  9m6s                   kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          9m6s                   kubelet            Container image "docker.io/calico/cni:v3.25.0" already present on machine
  Normal   Created         9m6s                   kubelet            Created container upgrade-ipam
  Normal   Started         9m6s                   kubelet            Started container upgrade-ipam
  Normal   Pulled          8m34s (x2 over 9m5s)   kubelet            Container image "docker.io/calico/cni:v3.25.0" already present on machine
  Normal   Created         8m34s (x2 over 9m5s)   kubelet            Created container install-cni
  Normal   Started         8m34s (x2 over 9m5s)   kubelet            Started container install-cni
  Normal   Pulled          8m33s                  kubelet            Container image "docker.io/calico/node:v3.25.0" already present on machine
  Normal   Created         8m33s                  kubelet            Created container mount-bpffs
  Normal   Started         8m33s                  kubelet            Started container mount-bpffs
  Normal   Pulled          8m32s                  kubelet            Container image "docker.io/calico/node:v3.25.0" already present on machine
  Normal   Created         8m32s                  kubelet            Created container calico-node
  Normal   Started         8m32s                  kubelet            Started container calico-node
  Warning  Unhealthy       8m30s (x2 over 8m31s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy       8m26s                  kubelet            Readiness probe failed: 2023-03-30 17:55:32.710 [INFO][227] confd/health.go 180: Number of node(s) with BGP peering established = 3
calico/node is not ready: BIRD is not ready: BGP not established with 172.20.0.2
  Warning  Unhealthy  8m16s  kubelet  Readiness probe failed: 2023-03-30 17:55:42.727 [INFO][249] confd/health.go 180: Number of node(s) with BGP peering established = 3
calico/node is not ready: BIRD is not ready: BGP not established with 172.20.0.2
  Normal  Killing         7m34s  kubelet  Stopping container calico-node
  Normal  SandboxChanged  7m8s   kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  Pulled          7m8s   kubelet  Container image "docker.io/calico/cni:v3.25.0" already present on machine
  Normal  Created         7m8s   kubelet  Created container upgrade-ipam
  Normal  Started         7m8s   kubelet  Started container upgrade-ipam
$ kubectl describe ds -n kube-system calico-node
Name:           calico-node
Selector:       k8s-app=calico-node
Node-Selector:  kubernetes.io/os=linux
Labels:         k8s-app=calico-node
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 5
Current Number of Nodes Scheduled: 5
Number of Nodes Scheduled with Up-to-date Pods: 5
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status:  3 Running / 0 Waiting / 2 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=calico-node
  Service Account:  calico-node
  Init Containers:
   upgrade-ipam:
    Image:      docker.io/calico/cni:v3.25.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
   install-cni:
    Image:      docker.io/calico/cni:v3.25.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /opt/cni/bin/install
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
   mount-bpffs:
    Image:      docker.io/calico/node:v3.25.0
    Port:       <none>
    Host Port:  <none>
    Command:
      calico-node
      -init
      -best-effort
    Environment:  <none>
    Mounts:
      /nodeproc from nodeproc (ro)
      /sys/fs from sys-fs (rw)
      /var/run/calico from var-run-calico (rw)
  Containers:
   calico-node:
    Image:      docker.io/calico/node:v3.25.0
    Port:       <none>
    Host Port:  <none>
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      CALICO_IPV6POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_HEALTHENABLED:                true
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/bpf from bpffs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
  Volumes:
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
   var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
   var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
   xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
   sys-fs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
   bpffs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/bpf
    HostPathType:  Directory
   nodeproc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  
   cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
   cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
   cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
   host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
   policysync:
    Type:               HostPath (bare host directory volume)
    Path:               /var/run/nodeagent
    HostPathType:       DirectoryOrCreate
  Priority Class Name:  system-node-critical
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  SuccessfulCreate  49m   daemonset-controller  Created pod: calico-node-pngj7
  Normal  SuccessfulCreate  49m   daemonset-controller  Created pod: calico-node-kf69z
  Normal  SuccessfulCreate  49m   daemonset-controller  Created pod: calico-node-lpkxj
  Normal  SuccessfulCreate  49m   daemonset-controller  Created pod: calico-node-hm7pf
  Normal  SuccessfulCreate  49m   daemonset-controller  Created pod: calico-node-hvws2

Kubernetes version

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.0-rc.0", GitCommit:"cf60ee25590ac79334f823b7f03cf33105a442c1", GitTreeState:"clean", BuildDate:"2023-03-23T18:59:06Z", GoVersion:"go1.20.2", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

none

OS version

Talos Linux 1.4.0-alpha.3

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

Calico v3.25.0, also reproducible with 3.24.1

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

At the same time I don’t quite understand what is exactly wrong, as if I just do a daemonset with sleep 10 command, the pod also enters Completed state, but it gets quickly restarted by the kubelet via CrashLoopBackOff transition.

This is because of restartPolicy: Always. The pod doesn’t reach the Succeeded phase in this case. The kubelet just recreates the containers. However, when a pod is actually marked as “Succeeded” then kubelet would stop looking at it.

However, it should be the responsibility of the daemonset controller to recreate a pod.