kubernetes: [Flaky Test][sig-node] kubernetes-unit-test TestHTTP1DoNotReuseRequestAfterTimeout is being Flaky

Which jobs are flaking?

periodic-kubernetes-unit-test-ppc64le

Which tests are flaking?

  1. vendor/k8s.io/client-go/rest.TestHTTP1DoNotReuseRequestAfterTimeout
  2. vendor/k8s.io/client-go/rest.TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1
  3. vendor/k8s.io/client-go/rest.TestHTTP1DoNotReuseRequestAfterTimeout/HTTP2

Since when has it been flaking?

16th November 2021

Testgrid link

https://k8s-testgrid.appspot.com/sig-node-ppc64le#unit-tests

Reason for failure (if possible)

This test is added by change https://github.com/kubernetes/kubernetes/pull/104844

Seems like it started being flaky from the time added.

Anything else we need to know?

Below is the trace from job:

 === FAIL: vendor/k8s.io/client-go/rest TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1 (0.13s)
    request_test.go:3016: Unexpected error: Get "https://127.0.0.1:34701/foo?timeout=100ms": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/20 12:06:03 http: TLS handshake error from 127.0.0.1:37890: read tcp 127.0.0.1:34701->127.0.0.1:37890: use of closed network connection
    --- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1 (0.13s)
=== FAIL: vendor/k8s.io/client-go/rest TestHTTP1DoNotReuseRequestAfterTimeout (0.30s) 

Relevant SIG(s)

/sig node

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (23 by maintainers)

Most upvoted comments

submitted https://github.com/kubernetes/kubernetes/pull/106716 for bumping the timeout on this test, thanks for keep pushing

Why was the binary for ^TestReconnectBroken being generated when we are running stress tool for TestHTTP1DoNotReuseRequestAfterTimeout? Please help me understand if I am missing something.

I was lazy and carried over the cli flags, but it doesn’t matter, the generated binary is the same

$ go test -run ^TestReconnectBroken k8s.io/client-go/rest -c -race -o test_withflag
$ go test k8s.io/client-go/rest -c -race -o test_withoutflag
$ md5sum test_*
cda6ddc9d31f4c760c93b34ad78981dc  test_withflag
cda6ddc9d31f4c760c93b34ad78981dc  test_withoutflag

there were some http changes in golang golang/go@f9cb33c

however, I can’t reproduce it in my host 😕

stress ./rest.test -test.run TestHTTP1DoNotReuseRequestAfterTimeout -test.v
5s: 538 runs so far, 0 failures
10s: 1096 runs so far, 0 failures
15s: 1656 runs so far, 0 failures
20s: 2208 runs so far, 0 failures
25s: 2760 runs so far, 0 failures
30s: 3312 runs so far, 0 failures
35s: 3867 runs so far, 0 failures
40s: 4419 runs so far, 0 failures
45s: 4977 runs so far, 0 failures
go version
go version go1.17.3 linux/amd64

I compiled the test binary using the -race flag enabled. We see this flaky only when the race is enabled. @aojea Can you please try passing -race while compiling the binary for the stress tool.

I followed https://github.com/kubernetes/community/blob/master/contributors/devel/sig-testing/flaky-tests.md#deflaking-unit-tests

@aojea When I run using the stress tool the test is flaking in both ppc64le and x86_64.

  • on ppc64le:
[root@rajalakshmi-workspace1 kubernetes]# stress ./rest.test -test.run TestHTTP1DoNotReuseRequestAfterTimeout

/tmp/go-stress-20211123T070635-1678334821
--- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout (0.45s)
    --- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1 (0.28s)
        request_test.go:2974: TEST Connected from 127.0.0.1:47712 on /foo
        request_test.go:2974: TEST Connected from 127.0.0.1:47712 on /hang
        request_test.go:2976: TEST hanging 127.0.0.1:47712
        request_test.go:2974: TEST Connected from 127.0.0.1:47730 on /foo
        request_test.go:3030: Unexpected error: Get "https://127.0.0.1:35107/foo?timeout=100ms": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
FAIL


ERROR: exit status 1


/tmp/go-stress-20211123T070635-1511241200
2021/11/23 07:06:38 http: TLS handshake error from 127.0.0.1:52740: read tcp 127.0.0.1:36005->127.0.0.1:52740: read: connection reset by peer
--- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout (0.29s)
    --- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1 (0.11s)
        request_test.go:3016: Unexpected error: Get "https://127.0.0.1:36005/foo?timeout=100ms": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
FAIL


ERROR: exit status 1

5s: 72 runs so far, 2 failures (2.78%)
  • on x86_64:
[root@toad1 kubernetes]# stress ./rest.test -test.run TestHTTP1DoNotReuseRequestAfterTimeout
5s: 50 runs so far, 0 failures

/tmp/go-stress-20211123T025122-1432112823
--- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout (0.27s)
    --- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout/HTTP2 (0.14s)
        request_test.go:3016: Unexpected error: Get "https://127.0.0.1:44741/foo?timeout=100ms": context deadline exceeded
        request_test.go:2974: TEST Connected from 127.0.0.1:38406 on /foo
FAIL


ERROR: exit status 1

10s: 101 runs so far, 1 failures (0.99%)
15s: 156 runs so far, 1 failures (0.64%)
20s: 204 runs so far, 1 failures (0.49%)
25s: 250 runs so far, 1 failures (0.40%)
30s: 298 runs so far, 1 failures (0.34%)
35s: 351 runs so far, 1 failures (0.28%)
40s: 402 runs so far, 1 failures (0.25%)
45s: 454 runs so far, 1 failures (0.22%)
50s: 501 runs so far, 1 failures (0.20%)
55s: 553 runs so far, 1 failures (0.18%)
1m0s: 605 runs so far, 1 failures (0.17%)
1m5s: 655 runs so far, 1 failures (0.15%)

/tmp/go-stress-20211123T025122-2483598037
2021/11/23 02:52:32 http: TLS handshake error from 127.0.0.1:51906: read tcp 127.0.0.1:39957->127.0.0.1:51906: use of closed network connection
--- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout (0.37s)
    --- FAIL: TestHTTP1DoNotReuseRequestAfterTimeout/HTTP1 (0.24s)
        request_test.go:2974: TEST Connected from 127.0.0.1:51894 on /foo
        request_test.go:2974: TEST Connected from 127.0.0.1:51894 on /hang
        request_test.go:2976: TEST hanging 127.0.0.1:51894
        request_test.go:3030: Unexpected error: Get "https://127.0.0.1:39957/foo?timeout=100ms": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
FAIL


ERROR: exit status 1

1m10s: 706 runs so far, 2 failures (0.28%)

@aojea the golang version was

[root@toad1 kubernetes]# go version
go version go1.17.3 linux/amd64
[root@toad1 kubernetes]#