kubernetes: Pods getting killed after startup probe failure

What happened?

Hi! I can see my pods are getting killed post probe failure. Firstly, I can see these two odd logs in my syslog:

networkd-dispatcher[781]: WARNING:Unknown index 56 seen, reloading interface list
systemd-udevd[455717]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

Next, I see my probes getting failed for multiple pods like Startup probe for "pod1:app" failed (failure): Get http://100.125.215.114:8081/ping: dial tcp 100.125.215.114:8081: connect: connection refused Next, I see the pod getting killed by kubelet kubelet[11767]: I0831 03:58:09.959675 11767 kubelet.go:1926] SyncLoop (DELETE, "api"): kubelet[11767]: I0831 03:58:09.959996 11767 kuberuntime_container.go:648] Killing container "" with 30second grace period I’m using Ubuntu 20.04 and Kubernetes 1.18.9 . I checked the Ubuntu errors and found this report - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1775131 However, there’s no solution/lead mentioned here.

What did you expect to happen?

  1. The networkd and systemd log should not have come
  2. The pod shouldn’t have been killed post probe failure.
  3. Multiple probes shouldn’t have failed at once on the same node.

How can we reproduce it (as minimally and precisely as possible)?

This seems intermittent and I don’t think we can recreate it.

Anything else we need to know?

No response

Kubernetes version

1.18.9

Cloud provider

AWS

OS version

Ubuntu 20.04

Install tools

No response

Container runtime (CRI) and version (if applicable)

CRI: Docker

Related plugins (CNI, CSI, …) and versions (if applicable)

CNI: Calico

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (12 by maintainers)

Most upvoted comments

Thanks for reporting this issue @tarun-asthana. You may update your cluster to supported versions. And if this issue is reproduced again, please give us full logs of kubelet and kube-controller-manager. Also the right reproducible scenario would be very helpful for us.

For now, I think there are little things we can do as there are no remaining logs or a reproducible scenario. As far as I confirmed, there are no critical bugs in the probe failure of the supported versions.

/triage needs-information

@gjkim42 I can confirm that all pods on the node were not getting killed. Neither was the node terminated. This is not expected behaviour. I’ve opened the ticket with Ubuntu - https://bugs.launchpad.net/ubuntu/+bug/1988661