kubernetes: Failing Test: [sig-node] Probing container should be restarted with an exec liveness probe with timeout

Which jobs are failing:

https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-kubelet-1-19-on-latest

this is a skew job:

  • kubelet is at the HEAD of release-1.19
  • apiserver / test suite are at HEAD of the master branch (to be released as 1.21)

Which test(s) are failing:

Kubernetes e2e suite.[sig-node] Probing container should be restarted with an exec liveness probe with timeout [NodeConformance] [Conformance]

the test is here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/common/node/container_probe.go#L212-L226

Since when has it been failing:

Since the job was added last week.

Testgrid link:

https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-kubelet-1-19-on-latest

Reason for failure:

this condition does not pass:

Description: A Pod is created with liveness probe with a Exec action on the Pod. If the liveness probe call does not return within the timeout specified, liveness probe MUST restart the Pod.

Anything else we need to know:

i haven’t looked but could be a code failure or just a problem in the test suite at master/HEAD.

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-kubelet-1-19-on-latest/1368323279995015168

/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:630
Mar  6 22:36:02.848: pod container-probe-1648/busybox-ce66c80f-44c7-48d1-bb91-e9d1902893c3 - expected number of restarts: 1, found restarts: 0
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/common/node/container_probe.go:225

looks like the probe is not restarting the pod.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

i don’t think i can come up with a tag name that everybody would like but .e.g something like [KubeletMaxSkewN-1] would mean that we skip this test when we are running an N-2 job.

I think we’d want a tag that we wouldn’t need to change in future releases… something like [MinimumKubelet:1.20]

Then the job testing against 1.19 kubelets could -skip '\[MinimumKubelet:(1.20|1.21)\]'

I agree with #99909 (comment) and would like to see a PR to k/community describing this new tag (look for where Feature: is defined)

https://github.com/kubernetes/community/pull/5622

that test was promoted to conformance in 1.21 in https://github.com/kubernetes/kubernetes/pull/97619

it exercises a specific bug that was fixed in 1.20 kubelets (https://github.com/kubernetes/kubernetes/pull/94115), but which modified behavior in a user-facing way, so it wasn’t backported, so a 1.19 node won’t pass that test