kubernetes: Integration test suite failing
I am facing test case failures while running integration test cases using test-dockerized.sh with an added flag “-p 1” to disable parallelism. Following failures are observed on 1.18.4 branch
=== Failed
=== FAIL: test/integration/apiserver/apply (0.00s)
I0629 02:44:47.759999 23295 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/apiserver/apply 106.723s
=== FAIL: test/integration/auth (0.00s)
I0629 02:47:31.644814 23520 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/auth 73.154s
=== FAIL: test/integration/daemonset (0.00s)
I0629 02:53:17.668164 23978 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/daemonset 300.637s
=== FAIL: test/integration/master TestReconcilerMasterLeaseMultiCombined (20.49s)
=== FAIL: test/integration/replicaset (0.00s)
I0629 03:13:51.149436 25639 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/replicaset 77.660s
=== FAIL: test/integration/scheduler (0.00s)
I0629 03:16:34.260033 25865 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/scheduler 84.373s
=== FAIL: test/integration/volume (0.00s)
I0629 03:20:53.646581 26343 etcd.go:81] etcd already running at http://127.0.0.1:2379
FAIL k8s.io/kubernetes/test/integration/volume 122.375s
However, if I run each of these modules individually, I see that only one of the test cases fails from each module with the error
I0701 03:46:24.334858 2750 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{http://127.0.0.1:2379 <nil> 0 <nil>}]
I0701 03:46:43.342529 2750 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
E0701 03:46:43.342551 2750 master_utils.go:197] error in bringing up the master: error building core storage: context deadline exceeded
W0701 03:46:43.342618 2750 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2379 <nil> 0 <nil>}: didn't receive server preface in time. Reconnecting...
F0701 03:46:43.342680 2750 master_utils.go:199] error in bringing up the master: error building core storage: context deadline exceeded
and if that test case is run individually, it passes
How can I get the test suite to pass? Following are my specs
Specs:
Arch: x86_64
OS: Ubuntu 18.04
CPUs: 4
go version: go version go1.14.4 linux/amd64
Attaching log file for reference INTEL1.18-p12.log
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (17 by maintainers)
@RobertKielty this issue is not reflecting the failure we are seeing on https://testgrid.k8s.io/sig-release-master-blocking#integration-master. Please see https://github.com/kubernetes/test-infra/pull/18196#issuecomment-658000292. It may be good to open another issue to track that. I plan on submitting a fix today.
/close
@roycaihw I just tried running the following and the test cases passed successfully.
I guess this issue can be closed.
Thank you @roycaihw for triaging this.
@hasheddan @RobertKielty It seems that the failure of https://testgrid.k8s.io/sig-release-master-blocking#integration-master was caused by https://github.com/kubernetes/test-infra/pull/18196#issuecomment-657975786. It looks like that PR added some tests to the prowjob, which aren’t supposed to be run in parallel, and therefore hit the “etcd already running” error.
I think the
sig-release-master-blocking#integration-masterfailure is unrelated to this issue. Since the failure was related to prowjob config change introduced two days ago, and this issue was about running integration tests manuallybased on:
but feel free to adjust.