kubernetes: [Failing Test] gce-windows-2019-master (ci-kubernetes-e2e-windows-gce-2019)
Which jobs are failing:
gce-windows-2019-master (ci-kubernetes-e2e-windows-gce-2019)
Which test(s) are failing:
[sig-cli] Kubectl client Guestbook application should create and stop a working application [Conformance]
Since when has it been failing:
18th March 09:47 PDT
Testgrid link: https://testgrid.k8s.io/sig-release-master-informing#gce-windows-2019-master
Reason for failure:
Full Stack Trace
k8s.io/kubernetes/test/e2e/kubectl.validateGuestbookApp(0x534a9e0, 0xc00300f8c0, 0xc001af36a0, 0xc)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:1858 +0x5d0
k8s.io/kubernetes/test/e2e/kubectl.glob..func1.7.2()
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:342 +0x165
k8s.io/kubernetes/test/e2e.RunE2ETests(0xc001a32300)
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:125 +0x324
k8s.io/kubernetes/test/e2e.TestE2E(0xc001a32300)
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e_test.go:119 +0x2b
testing.tRunner(0xc001a32300, 0x4ae80c8)
/usr/local/go/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:960 +0x350
STEP: using delete to clean up resources
Anything else we need to know: /cc @kubernetes/ci-signal /milestone v1.19 /priority important-soon /assign @soltysh /sig cli
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (15 by maintainers)
A bug was introduced a couple days ago that is preventing Windows clusters on GCE from starting up. Will try to fix that tomorrow.
So, I figured out what’s happening.
As you can see, the test is flaky. When it fails, the following error could be seen in the agnhost worker pods:
That basically means that either the DNS name could not be resolved, or it’s just a general network failure. To be more precise as to what happens and what leads to this error, it would be this: the container / agnhost app starts before the pod networking has been fully set up / configured, which means that when it tries to resolve
agnhost-master, it will fail.This can be easily observed by modifying the worker pod (
test/e2e/testing-manifests/guestbook/agnhost-slave-deployment.yaml.in) from:to:
the test consistently passes.
Ideally, we would fix this issue by making sure that all the networking is properly set up before the container entrypoint starts. Alternatively, we’d add a few retries in agnhost’s guestbook subcommand.
https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/windows/OWNERS are the people in charge of the
gce-windowstests @pjh