runtime: System.Net tests failing with System.TimeoutException

For example:

  • Test: System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientHandler_ConnectionPooling_Test.Http2_SmallConnectionTimeout_SubsequentRequestUsesDifferentConnection
      System.TimeoutException : The operation has timed out.
      Stack Trace:
        /_/src/libraries/Common/tests/System/Net/Http/Http2LoopbackServer.cs(196,0): at System.Net.Test.Common.Http2LoopbackServerFactory.CreateServerAsync(Func`3 funcAsync, Int32 millisecondsTimeout)
        /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs(1441,0): at System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientHandler_ConnectionPooling_Test.Http2_SmallConnectionTimeout_SubsequentRequestUsesDifferentConnection(String timeoutPropertyName)
        --- End of stack trace from previous location ---

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

Looks like the failures stopped with #55006.

there are 4 tests in *_Http3_Mock category. Could they have regressed due to your recent changes?

GetAsync_CancelDuringResponseHeadersReceived_TaskCanceledQuickly is a test I enabled for HTTP3 in my PR. so it’s possible that this is a separate regression from whatever else is going on here. But it seems like we should figure out the general issue here first.

@karelz it’s quite easy for me to re-send a job from > 4 days ago to the current machines and see if it still has the problem, even repeatedly if that’s useful. Just pick a “blessed” one and I can send it around.

Some more thoughts:

  • Helix Machines didn’t change on the 23rd, the last rollout was on the 24th, so if you’re looking at this happening “around 6/18-6/23”, the machines did not change.
  • The only linux things that changed in that time were RedHat 7 and SLES 12. Nothing else about, say, the ubuntu.1604.amd64.open.rt image changed in that time frame (image version, artifacts applied, etc).

Top 27 methods by number of failures caused by timeout are all happening on ubuntu.1604.amd64.open.rt.

QueueName MainFailures PrFailures AffectedBranches
ubuntu.1604.amd64.open.rt 36 2671 101
sles.15.amd64.open.rt 3 108 34
ubuntu.1804.amd64.open.rt 6 38 26
osx.1015.amd64.open 2 9 9
windows.10.amd64.serverrs5.open.rt 4 14 7
windows.10.amd64.server19h1.open.rt 2 7 6