kubernetes: node-kubelet-conformance fails with "pod ... was not deleted"
Which jobs are failing:
ci-kubernetes-node-kubelet-conformance
Which test(s) are failing:
- Variable Expansion should succeed in writing subpaths in container
- Variable Expansion should verify that a failing subpath expansion can be modified during the lifecycle of a container
Since when has it been failing:
Since Fri Jul 10 15:41:46 2020 -0700, when this PR was merged
Testgrid link:
https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-conformance
Reason for failure:
Used DeletePropagationForeground option to delete test pod introduced by this commit
Anything else we need to know:
The test pod stuck in Terminating state until 5m timeout of DeletePodWithWait expired. It happens with empty pod as well. Here is a simplified ginkgo test case that triggers this failure:
framework.ConformanceIt("trigger stuck pod deletion", func() {
pod := newPod([]string{"sh", "-c", "sleep 600"}, nil, nil, nil)
ginkgo.By("creating the pod")
var podClient *framework.PodClient = f.PodClient()
pod = podClient.Create(pod)
ginkgo.By("waiting for pod running")
err := e2epod.WaitTimeoutForPodRunningInNamespace(f.ClientSet, pod.Name, pod.Namespace, framework.PodStartShortTimeout)
framework.ExpectNoError(err, "while waiting for pod to be running")
ginkgo.By("deleting the pod gracefully")
err = e2epod.DeletePodWithWait(f.ClientSet, pod)
framework.ExpectNoError(err, "failed to delete pod")
})
The pod phase is “Failed” and container reason is “Error” for some reason:
# kubectl get pod -n var-expansion-1646 var-expansion-f5a6d071-e1c0-40a6-9292-49807b1f862f -o yaml |grep -i -B15 fail
image: busybox:1.29
imageID: docker-pullable://busybox@sha256:8ccbac733d19c0dd4d70b4f0c1e12245b5fa3ad24758a11035ee505c629c0796
lastState: {}
name: dapi-container
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://7b60220b917510a6819d04774ebf7be660a67cc85ceea5e99174322b26190910
exitCode: 137
finishedAt: "2020-07-21T16:35:39Z"
reason: Error
startedAt: "2020-07-21T16:35:07Z"
hostIP: 10.237.72.179
phase: Failed
Reverting above commit or changing delete option to DeletePropagationBackground should fix this.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (17 by maintainers)
putting my @BenTheElder hat on, probably not? more hour+ tests that run on every PR are a non-goal