kubernetes: [Flaky Test] [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]

Which jobs are flaking:

ci-kubernetes-kind-e2e-parallel

Which test(s) are flaking:

[sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]

Testgrid link:

https://testgrid.k8s.io/sig-release-master-blocking#kind-master-parallel&include-filter-by-regex=HostPort validates that there is no conflict

Reason for failure:

/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:630
Mar 12 18:37:32.136: Failed to connect to exposed host ports
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:113

Anything else we need to know:

  • links to go.k8s.io/triage appreciated
  • links to specific failures in spyglass appreciated

Example Spyglass log (linked to start of this test): https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-parallel/1370436989907636224#1:build-log.txt:32060

Other Spyglass failures:

Triage: https://storage.googleapis.com/k8s-gubernator/triage/index.html?test=HostPort validates that there is no conflict between pods#62aebe1e6cd0a1e26162

/sig network /priority important-soon

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 18 (17 by maintainers)

Most upvoted comments

Looks like another one a few hours ago: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-parallel/1371419397389815808

From node logs https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-parallel/1371419397389815808/artifacts/logs/kind-worker/kubelet.log

Mar 15 11:32:10 kind-worker kubelet[244]: E0315 11:32:10.382627     244 remote_runtime.go:334] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"23e2e5eb4ede2e9281768b5108ff59bf1339284670043c9fdc5655355444d868\": not found" containerID="23e2e5eb4ede2e9281768b5108ff59bf1339284670043c9fdc5655355444d868"
Mar 15 11:32:10 kind-worker kubelet[244]: E0315 11:32:10.382753     244 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
Mar 15 11:32:10 kind-worker kubelet[244]: goroutine 3224 [running]:
Mar 15 11:32:10 kind-worker kubelet[244]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x41ce400, 0x737a320)
Mar 15 11:32:10 kind-worker kubelet[244]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
Mar 15 11:32:10 kind-worker kubelet[244]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
Mar 15 11:32:10 kind-worker kubelet[244]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
Mar 15 11:32:10 kind-worker kubelet[244]: panic(0x41ce400, 0x737a320)
Mar 15 11:32:10 kind-worker kubelet[244]:         /usr/local/go/src/runtime/panic.go:965 +0x1b9
Mar 15 11:32:10 kind-worker kubelet[244]: k8s.io/kubernetes/pkg/kubelet/kuberuntime.(*kubeGenericRuntimeManager).killContainersWithSyncResult.func1(0xc0012c31a0, 0xc000d3c160, 0x0, 0x0, 0xc002161200, 0xc0001f7ab0)
Mar 15 11:32:10 kind-worker kubelet[244]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container.go:690 +0x33a
Mar 15 11:32:10 kind-worker kubelet[244]: created by k8s.io/kubernetes/pkg/kubelet/kuberuntime.(*kubeGenericRuntimeManager).killContainersWithSyncResult
Mar 15 11:32:10 kind-worker kubelet[244]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container.go:683 +0x105

Does not appear to be a klog panic, idk if it’s related.

/assign I have to remember how kubelet -> cri -> cni portmap works exactly, I think that there were some problems with retries 🤔

In this case the pod that caused the test to fail, had to be retried by the kubelet

Mar 15 11:36:50 kind-worker containerd[173]: time="2021-03-15T11:36:50.881066433Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:pod2,Uid:4c5244f4-dda9-4ef9-b45a-7670718b6b17,Namespace:hostport-5025,Attempt:0,} failed, error" error="failed to start sandbox container task \"8c63883ddd77e46d8b57b4a9cf5fc07237cca622317412cb0d44715e395ccd3d\": context deadline exceeded: unknown"

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-parallel/1371419397389815808/artifacts/logs/kind-worker/containerd.log

and later the pods runs

Mar 15 11:37:02 kind-worker containerd[173]: time="2021-03-15T11:37:02.767356802Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:pod2,Uid:4c5244f4-dda9-4ef9-b45a-7670718b6b17,Namespace:hostport-5025,Attempt:0,} returns sandbox id \"2e79c5eed66140b5849bc6c1075545d185c6bbd78afec8fee82b819ec975a5b6\""

I need to verify that the portmap info is handled correctly after retry, it may explain the issue here