kubernetes: [Failing Test] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container (ci-kubernetes-e2e-ubuntu-gce)

Which jobs are failing: ci-kubernetes-e2e-ubuntu-gce

Which test(s) are failing: [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container

Since when has it been failing: Flaking since https://github.com/kubernetes/kubernetes/compare/9223413a7...12d9183da Nov 11-12 Failing since https://github.com/kubernetes/kubernetes/compare/295010c30...5cfce4e5c Nov 12 1PM PST

Testgrid link: https://testgrid.k8s.io/sig-release-master-informing#gce-ubuntu-master-default

Reason for failure: Timeout

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/node/pods.go:206
Nov 13 09:51:47.661: timed out waiting for watch events for pod-submit-status-2-1
/usr/local/go/src/runtime/asm_amd64.s:1374

Anything else we need to know: Example spyglass links:

Can’t find a triage link for this test on this job

/sig node /cc @kubernetes/ci-signal @kubernetes/sig-node-test-failures /priority important-soon

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 31 (31 by maintainers)

Commits related to this issue

Most upvoted comments

@chrishenzie and I were debugging it, seems something strange happened with os.RemoveAll() in Ubuntu. @chrishenzie opened a test PR #96768 to verify our theory.

So I think the reverted PR #96759 logic is correct, but it discover an issue which was hidden before.

Created https://github.com/kubernetes/kubernetes/issues/96759 to track the secret cleanup issue

if there’s clear correlation with https://github.com/kubernetes/kubernetes/pull/84206, should we revert it, given how close we are to release?

This PR was merged just before failures started: #84206

@mucahitkurt can you please take a look? This test failure looks to happen in exactly the logic that was changed.

I’ll try to check it.

cc @jingxu97 @msau42

To my point earlier around dockershim, I saw this fail on containerd jobs as well so maybe it’s not specific to a container runtime