kubernetes: daemonset stuck in `Completed` state after a reboot (with graceful kubelet shutdown)
What happened?
I can reproduce this reliably with Calico v3.25.0 CNI, installed straight from the upstream manifest.
It seems to be a regression in Kubernetes 1.27.0-rc.0 compared to 1.27.0-beta.0, at least I can’t reproduce it with beta.0.
Cluster is deployed, and everything is ready & healthy. After a series of node reboots some calico-node
pods are stuck in Completed
state and never recover:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-6c99c8747f-q5g8d 1/1 Running 2 (14m ago) 26m 192.168.62.135 talos-default-controlplane-2 <none> <none>
kube-system calico-node-hm7pf 1/1 Running 2 (12m ago) 25m 172.20.0.2 talos-default-controlplane-1 <none> <none>
kube-system calico-node-hvws2 1/1 Running 2 (11m ago) 25m 172.20.0.4 talos-default-controlplane-3 <none> <none>
kube-system calico-node-kf69z 0/1 Completed 1 25m 172.20.0.6 talos-default-worker-2 <none> <none>
kube-system calico-node-lpkxj 0/1 Completed 1 (18m ago) 25m 172.20.0.5 talos-default-worker-1 <none> <none>
kube-system calico-node-pngj7 1/1 Running 2 (14m ago) 26m 172.20.0.3 talos-default-controlplane-2 <none> <none>
This of course disrupts the CNI and affects the scheduled workloads.
The daemonset is “happy” with 2 pods being out:
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system calico-node 5 5 3 5 3 kubernetes.io/os=linux 45m calico-node docker.io/calico/node:v3.25.0 k8s-app=calico-node
If I delete the Completed
pod manually, the pod recovers without any issues.
What did you expect to happen?
The pods are restarted.
How can we reproduce it (as minimally and precisely as possible)?
Bring up a cluster with graceful node shutdown, reboot nodes a few times (including multiple node at anoce).
Anything else we need to know?
Additional detailed output:
$ kubectl get pods -n kube-system calico-node-kf69z -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-03-30T17:46:42Z"
generateName: calico-node-
labels:
controller-revision-hash: 6c6c65fb6c
k8s-app: calico-node
pod-template-generation: "1"
name: calico-node-kf69z
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: calico-node
uid: eff56b3d-c824-46f0-b922-26ccb2b94aaa
resourceVersion: "3335"
uid: c3624a67-8f41-4449-8024-5a33f7f69f30
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- talos-default-worker-2
containers:
- env:
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
key: calico_backend
name: calico-config
- name: CLUSTER_TYPE
value: k8s,bgp
- name: IP
value: autodetect
- name: CALICO_IPV4POOL_IPIP
value: Always
- name: CALICO_IPV4POOL_VXLAN
value: Never
- name: CALICO_IPV6POOL_VXLAN
value: Never
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
key: veth_mtu
name: calico-config
- name: FELIX_VXLANMTU
valueFrom:
configMapKeyRef:
key: veth_mtu
name: calico-config
- name: FELIX_WIREGUARDMTU
valueFrom:
configMapKeyRef:
key: veth_mtu
name: calico-config
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: ACCEPT
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_HEALTHENABLED
value: "true"
envFrom:
- configMapRef:
name: kubernetes-services-endpoint
optional: true
image: docker.io/calico/node:v3.25.0
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/calico-node
- -shutdown
livenessProbe:
exec:
command:
- /bin/calico-node
- -felix-live
- -bird-live
failureThreshold: 6
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: calico-node
readinessProbe:
exec:
command:
- /bin/calico-node
- -felix-ready
- -bird-ready
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 250m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /var/run/calico
name: var-run-calico
- mountPath: /var/lib/calico
name: var-lib-calico
- mountPath: /var/run/nodeagent
name: policysync
- mountPath: /sys/fs/bpf
name: bpffs
- mountPath: /var/log/calico/cni
name: cni-log-dir
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9r9p4
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
initContainers:
- command:
- /opt/cni/bin/calico-ipam
- -upgrade
env:
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
key: calico_backend
name: calico-config
envFrom:
- configMapRef:
name: kubernetes-services-endpoint
optional: true
image: docker.io/calico/cni:v3.25.0
imagePullPolicy: IfNotPresent
name: upgrade-ipam
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/cni/networks
name: host-local-net-dir
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9r9p4
readOnly: true
- command:
- /opt/cni/bin/install
env:
- name: CNI_CONF_NAME
value: 10-calico.conflist
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
key: cni_network_config
name: calico-config
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CNI_MTU
valueFrom:
configMapKeyRef:
key: veth_mtu
name: calico-config
- name: SLEEP
value: "false"
envFrom:
- configMapRef:
name: kubernetes-services-endpoint
optional: true
image: docker.io/calico/cni:v3.25.0
imagePullPolicy: IfNotPresent
name: install-cni
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9r9p4
readOnly: true
- command:
- calico-node
- -init
- -best-effort
image: docker.io/calico/node:v3.25.0
imagePullPolicy: IfNotPresent
name: mount-bpffs
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /sys/fs
mountPropagation: Bidirectional
name: sys-fs
- mountPath: /var/run/calico
mountPropagation: Bidirectional
name: var-run-calico
- mountPath: /nodeproc
name: nodeproc
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9r9p4
readOnly: true
nodeName: talos-default-worker-2
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: calico-node
serviceAccountName: calico-node
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoSchedule
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- hostPath:
path: /var/run/calico
type: ""
name: var-run-calico
- hostPath:
path: /var/lib/calico
type: ""
name: var-lib-calico
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /sys/fs/
type: DirectoryOrCreate
name: sys-fs
- hostPath:
path: /sys/fs/bpf
type: Directory
name: bpffs
- hostPath:
path: /proc
type: ""
name: nodeproc
- hostPath:
path: /opt/cni/bin
type: ""
name: cni-bin-dir
- hostPath:
path: /etc/cni/net.d
type: ""
name: cni-net-dir
- hostPath:
path: /var/log/calico/cni
type: ""
name: cni-log-dir
- hostPath:
path: /var/lib/cni/networks
type: ""
name: host-local-net-dir
- hostPath:
path: /var/run/nodeagent
type: DirectoryOrCreate
name: policysync
- name: kube-api-access-9r9p4
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-30T17:56:51Z"
message: Pod was terminated in response to imminent node shutdown.
reason: TerminationByKubelet
status: "True"
type: DisruptionTarget
- lastProbeTime: null
lastTransitionTime: "2023-03-30T17:55:25Z"
reason: PodCompleted
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-03-30T17:56:50Z"
reason: PodCompleted
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-03-30T17:56:50Z"
reason: PodCompleted
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-03-30T17:46:42Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
image: docker.io/calico/node:v3.25.0
imageID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
lastState: {}
name: calico-node
ready: false
restartCount: 1
started: false
state:
terminated:
containerID: containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
exitCode: 0
finishedAt: "2023-03-30T17:56:24Z"
reason: Completed
startedAt: "2023-03-30T17:55:26Z"
hostIP: 172.20.0.6
initContainerStatuses:
- containerID: containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
image: docker.io/calico/cni:v3.25.0
imageID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
lastState: {}
name: upgrade-ipam
ready: true
restartCount: 2
state:
terminated:
containerID: containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
exitCode: 0
finishedAt: "2023-03-30T17:56:50Z"
reason: Completed
startedAt: "2023-03-30T17:56:50Z"
- containerID: containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
image: docker.io/calico/cni:v3.25.0
imageID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
lastState: {}
name: install-cni
ready: true
restartCount: 1
state:
terminated:
containerID: containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
exitCode: 0
finishedAt: "2023-03-30T17:55:25Z"
reason: Completed
startedAt: "2023-03-30T17:55:24Z"
- containerID: containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
image: docker.io/calico/node:v3.25.0
imageID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
lastState: {}
name: mount-bpffs
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
exitCode: 0
finishedAt: "2023-03-30T17:55:25Z"
reason: Completed
startedAt: "2023-03-30T17:55:25Z"
phase: Succeeded
podIP: 172.20.0.6
podIPs:
- ip: 172.20.0.6
qosClass: Burstable
startTime: "2023-03-30T17:47:26Z"
$ kubectl -n kube-system describe pod calico-node-kf69z
Name: calico-node-kf69z
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: talos-default-worker-2/172.20.0.6
Start Time: Thu, 30 Mar 2023 21:47:26 +0400
Labels: controller-revision-hash=6c6c65fb6c
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Succeeded
IP: 172.20.0.6
IPs:
IP: 172.20.0.6
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: containerd://78f7523609b616ab91b115721d7db40473b656a2d0d40c97477263e87b734aa2
Image: docker.io/calico/cni:v3.25.0
Image ID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Mar 2023 21:56:50 +0400
Finished: Thu, 30 Mar 2023 21:56:50 +0400
Ready: True
Restart Count: 2
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
install-cni:
Container ID: containerd://4ff6b96980023828a85e1527297dd07de61a9cfdc5692fce0635d9b39a76ee5f
Image: docker.io/calico/cni:v3.25.0
Image ID: docker.io/calico/cni@sha256:a38d53cb8688944eafede2f0eadc478b1b403cefeff7953da57fe9cd2d65e977
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Mar 2023 21:55:24 +0400
Finished: Thu, 30 Mar 2023 21:55:25 +0400
Ready: True
Restart Count: 1
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
mount-bpffs:
Container ID: containerd://32f143e7f1eb8f81b146941d21f93225c15bdece80f2d5ef39bd862fc52c96f3
Image: docker.io/calico/node:v3.25.0
Image ID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
Port: <none>
Host Port: <none>
Command:
calico-node
-init
-best-effort
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Mar 2023 21:55:25 +0400
Finished: Thu, 30 Mar 2023 21:55:25 +0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/nodeproc from nodeproc (ro)
/sys/fs from sys-fs (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
Containers:
calico-node:
Container ID: containerd://3bfddceb18e976566f39625fde9f6203c4d06f48f91264b401c945c19712e98d
Image: docker.io/calico/node:v3.25.0
Image ID: docker.io/calico/node@sha256:a85123d1882832af6c45b5e289c6bb99820646cb7d4f6006f98095168808b1e6
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Mar 2023 21:55:26 +0400
Finished: Thu, 30 Mar 2023 21:56:24 +0400
Ready: False
Restart Count: 1
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
CALICO_IPV6POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_HEALTHENABLED: true
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/bpf from bpffs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r9p4 (ro)
Conditions:
Type Status
DisruptionTarget True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sys-fs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
bpffs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType: Directory
nodeproc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
kube-api-access-9r9p4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/calico-node-kf69z to talos-default-worker-2
Normal Pulling 16m kubelet Pulling image "docker.io/calico/cni:v3.25.0"
Normal Pulled 16m kubelet Successfully pulled image "docker.io/calico/cni:v3.25.0" in 2.574137942s (2.574164652s including waiting)
Normal Created 16m kubelet Created container upgrade-ipam
Normal Started 16m kubelet Started container upgrade-ipam
Normal Pulled 16m kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 16m kubelet Created container install-cni
Normal Started 16m kubelet Started container install-cni
Normal Pulling 16m kubelet Pulling image "docker.io/calico/node:v3.25.0"
Normal Pulled 16m kubelet Successfully pulled image "docker.io/calico/node:v3.25.0" in 2.909875156s (2.909886566s including waiting)
Normal Created 16m kubelet Created container mount-bpffs
Normal Started 16m kubelet Started container mount-bpffs
Normal Pulled 16m kubelet Container image "docker.io/calico/node:v3.25.0" already present on machine
Normal Created 16m kubelet Created container calico-node
Normal Started 16m kubelet Started container calico-node
Warning Unhealthy 16m (x2 over 16m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 16m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Normal Killing 10m kubelet Stopping container calico-node
Normal SandboxChanged 9m6s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 9m6s kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 9m6s kubelet Created container upgrade-ipam
Normal Started 9m6s kubelet Started container upgrade-ipam
Normal Pulled 8m34s (x2 over 9m5s) kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 8m34s (x2 over 9m5s) kubelet Created container install-cni
Normal Started 8m34s (x2 over 9m5s) kubelet Started container install-cni
Normal Pulled 8m33s kubelet Container image "docker.io/calico/node:v3.25.0" already present on machine
Normal Created 8m33s kubelet Created container mount-bpffs
Normal Started 8m33s kubelet Started container mount-bpffs
Normal Pulled 8m32s kubelet Container image "docker.io/calico/node:v3.25.0" already present on machine
Normal Created 8m32s kubelet Created container calico-node
Normal Started 8m32s kubelet Started container calico-node
Warning Unhealthy 8m30s (x2 over 8m31s) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 8m26s kubelet Readiness probe failed: 2023-03-30 17:55:32.710 [INFO][227] confd/health.go 180: Number of node(s) with BGP peering established = 3
calico/node is not ready: BIRD is not ready: BGP not established with 172.20.0.2
Warning Unhealthy 8m16s kubelet Readiness probe failed: 2023-03-30 17:55:42.727 [INFO][249] confd/health.go 180: Number of node(s) with BGP peering established = 3
calico/node is not ready: BIRD is not ready: BGP not established with 172.20.0.2
Normal Killing 7m34s kubelet Stopping container calico-node
Normal SandboxChanged 7m8s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 7m8s kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 7m8s kubelet Created container upgrade-ipam
Normal Started 7m8s kubelet Started container upgrade-ipam
$ kubectl describe ds -n kube-system calico-node
Name: calico-node
Selector: k8s-app=calico-node
Node-Selector: kubernetes.io/os=linux
Labels: k8s-app=calico-node
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 5
Current Number of Nodes Scheduled: 5
Number of Nodes Scheduled with Up-to-date Pods: 5
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 2 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=calico-node
Service Account: calico-node
Init Containers:
upgrade-ipam:
Image: docker.io/calico/cni:v3.25.0
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
install-cni:
Image: docker.io/calico/cni:v3.25.0
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
mount-bpffs:
Image: docker.io/calico/node:v3.25.0
Port: <none>
Host Port: <none>
Command:
calico-node
-init
-best-effort
Environment: <none>
Mounts:
/nodeproc from nodeproc (ro)
/sys/fs from sys-fs (rw)
/var/run/calico from var-run-calico (rw)
Containers:
calico-node:
Image: docker.io/calico/node:v3.25.0
Port: <none>
Host Port: <none>
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
CALICO_IPV6POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_HEALTHENABLED: true
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/bpf from bpffs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sys-fs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
bpffs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType: Directory
nodeproc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
Priority Class Name: system-node-critical
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 49m daemonset-controller Created pod: calico-node-pngj7
Normal SuccessfulCreate 49m daemonset-controller Created pod: calico-node-kf69z
Normal SuccessfulCreate 49m daemonset-controller Created pod: calico-node-lpkxj
Normal SuccessfulCreate 49m daemonset-controller Created pod: calico-node-hm7pf
Normal SuccessfulCreate 49m daemonset-controller Created pod: calico-node-hvws2
Kubernetes version
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.0-rc.0", GitCommit:"cf60ee25590ac79334f823b7f03cf33105a442c1", GitTreeState:"clean", BuildDate:"2023-03-23T18:59:06Z", GoVersion:"go1.20.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
Talos Linux 1.4.0-alpha.3
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 3
- Comments: 19 (19 by maintainers)
This is because of
restartPolicy: Always
. The pod doesn’t reach theSucceeded
phase in this case. The kubelet just recreates the containers. However, when a pod is actually marked as “Succeeded” then kubelet would stop looking at it.However, it should be the responsibility of the daemonset controller to recreate a pod.