kubernetes: [failing test] "Pod Container Status should never report success for a pending container" hangs and times out KIND jobs
I’ve checked some jobs that timed out and something is odd with the test
[k8s.io] [sig-node] Pods Extended
^[[90mtest/e2e/framework/framework.go:592^[[0m
[k8s.io] Pod Container Status
^[[90mtest/e2e/framework/framework.go:592^[[0m
should never report success for a pending container
Check the durations in
[32m• [SLOW TEST:3484.128 seconds][0m
[k8s.io] [sig-node] Pods Extended
[90mtest/e2e/framework/framework.go:592[0m
[k8s.io] Pod Container Status
[90mtest/e2e/framework/framework.go:592[0m
should never report success for a pending container
[90mtest/e2e/node/pods.go:219[0m
May 28 21:05:05.238: INFO: Pod pod-submit-status-0-7 on node kind-worker2 timings total=56m0.34898024s t=1.832s run=0s execute=0s
[32m• [SLOW TEST:3376.269 seconds][0m
[k8s.io] [sig-node] Pods Extended
[90mtest/e2e/framework/framework.go:592[0m
[k8s.io] Pod Container Status
[90mtest/e2e/framework/framework.go:592[0m
should never report success for a pending container
[90mtest/e2e/node/pods.go:219[0m
May 28 18:08:32.338: INFO: Pod pod-submit-status-0-13 on node kind-worker2 timings total=54m38.072272958s t=1.751s run=0s execute=0s
_Originally posted by @aojea in https://github.com/kubernetes/kubernetes/pull/91491#issuecomment-635651958_
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 31 (31 by maintainers)
This can probably be closed out, but should spawn an issue for containerd that we’ll need to drive
https://github.com/opencontainers/runc/pull/2338 might fix the issue, though the PR is originally intended for cgroup2+systemd.
https://prow.k8s.io/?job=pull-kubernetes-e2e-kind looks greener already:
Success rate over time: 3h: 75%, 12h: 53%, 48h: 36%kind reverted to v1.3.3-14-g449e9269 in https://github.com/kubernetes-sigs/kind/pull/1649 around 2am
will watch https://storage.googleapis.com/k8s-gubernator/triage/index.html?ci=0&pr=1&job=kind&test=Pod Container Status should never report success for a pending container|CronJob should remove from active list jobs that have been deleted to see if signal clears up
but to v1.3.4-12-g1e902b2d, not the original version (v1.3.3-14-g449e9269)
looking at that delta now
https://storage.googleapis.com/k8s-gubernator/triage/index.html?ci=0&pr=1&job=kind#b05025d9cb729f714b09 is interesting… the containerd logs show the same error:
And appears to have started right around 5/26 - 5/27
Digging into https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-kind/1267615032988930051
The pod pod-submit-status-0-4 was never deleted:
Events:
Jun 2 01:00:54.440: INFO: At 2020-06-02 00:55:33 +0000 UTC - event for pod-submit-status-0-4: {default-scheduler } Scheduled: Successfully assigned pods-7342/pod-submit-status-0-4 to kind-worker2Jun 2 01:00:54.440: INFO: At 2020-06-02 00:55:35 +0000 UTC - event for pod-submit-status-0-4: {kubelet kind-worker2} Created: Created container busyboxJun 2 01:00:54.440: INFO: At 2020-06-02 00:55:35 +0000 UTC - event for pod-submit-status-0-4: {kubelet kind-worker2} Pulled: Container image "docker.io/library/busybox:1.29" already present on machineJun 2 01:00:54.440: INFO: At 2020-06-02 00:55:36 +0000 UTC - event for pod-submit-status-0-4: {kubelet kind-worker2} Started: Started container busyboxSearching the kubelet log for “pod-submit-status-0-4” shows repeated errors:
Jun 02 00:56:13 kind-worker2 kubelet[677]: E0602 00:56:13.036013 677 remote_runtime.go:563] ReopenContainerLog "583c17764ed521bc6e30482e10faff6f0cda6d5351cfdb9b942b905df0eec592" from runtime service failed: rpc error: code = Unknown desc = failed to create and open log file: open /var/log/pods/pods-7342_pod-submit-status-0-4_553b0f48-8dbd-4ff6-8e4c-289a0a9b7f5c/busybox/0.log: no such file or directoryJun 02 00:56:13 kind-worker2 kubelet[677]: E0602 00:56:13.036055 677 container_log_manager.go:210] Container "583c17764ed521bc6e30482e10faff6f0cda6d5351cfdb9b942b905df0eec592" log "/var/log/pods/pods-7342_pod-submit-status-0-4_553b0f48-8dbd-4ff6-8e4c-289a0a9b7f5c/busybox/0.log" doesn't exist, reopen container log failed: rpc error: code = Unknown desc = failed to create and open log file: open /var/log/pods/pods-7342_pod-submit-status-0-4_553b0f48-8dbd-4ff6-8e4c-289a0a9b7f5c/busybox/0.log: no such file or directoryJun 02 00:56:15 kind-worker2 kubelet[677]: E0602 00:56:15.717463 677 cri_stats_provider.go:591] Unable to fetch container log stats for path /var/log/pods/pods-7342_pod-submit-status-0-4_553b0f48-8dbd-4ff6-8e4c-289a0a9b7f5c/busybox: failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/log/pods/pods-7342_pod-submit-status-0-4_553b0f48-8dbd-4ff6-8e4c-289a0a9b7f5c/busybox with error exit status 1These repeat until the end of the log. None of the other test pods from the test log these errors.