kubernetes: Pods getting killed after startup probe failure
What happened?
Hi! I can see my pods are getting killed post probe failure. Firstly, I can see these two odd logs in my syslog:
networkd-dispatcher[781]: WARNING:Unknown index 56 seen, reloading interface list
systemd-udevd[455717]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Next, I see my probes getting failed for multiple pods like
Startup probe for "pod1:app" failed (failure): Get http://100.125.215.114:8081/ping: dial tcp 100.125.215.114:8081: connect: connection refused
Next, I see the pod getting killed by kubelet
kubelet[11767]: I0831 03:58:09.959675 11767 kubelet.go:1926] SyncLoop (DELETE, "api"): kubelet[11767]: I0831 03:58:09.959996 11767 kuberuntime_container.go:648] Killing container "" with 30second grace period
I’m using Ubuntu 20.04 and Kubernetes 1.18.9 . I checked the Ubuntu errors and found this report - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1775131
However, there’s no solution/lead mentioned here.
What did you expect to happen?
- The networkd and systemd log should not have come
- The pod shouldn’t have been killed post probe failure.
- Multiple probes shouldn’t have failed at once on the same node.
How can we reproduce it (as minimally and precisely as possible)?
This seems intermittent and I don’t think we can recreate it.
Anything else we need to know?
No response
Kubernetes version
1.18.9
Cloud provider
AWS
OS version
Ubuntu 20.04
Install tools
No response
Container runtime (CRI) and version (if applicable)
CRI: Docker
Related plugins (CNI, CSI, …) and versions (if applicable)
CNI: Calico
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (12 by maintainers)
Thanks for reporting this issue @tarun-asthana. You may update your cluster to supported versions. And if this issue is reproduced again, please give us full logs of kubelet and kube-controller-manager. Also the right reproducible scenario would be very helpful for us.
For now, I think there are little things we can do as there are no remaining logs or a reproducible scenario. As far as I confirmed, there are no critical bugs in the probe failure of the supported versions.
/triage needs-information
@gjkim42 I can confirm that all pods on the node were not getting killed. Neither was the node terminated. This is not expected behaviour. I’ve opened the ticket with Ubuntu - https://bugs.launchpad.net/ubuntu/+bug/1988661