runtime: Http Test hang/failure: MaxConnectionsPerServer_WaitingConnectionsAreCancelable
https://mc.dot.net/#/user/filipnavara/pr~2Fjenkins~2Fdotnet~2Fcorefx~2Fmaster~2F/test~2Ffunctional~2Fcli~2F/68ed63990b09ad9ceea911374da90acb1b042a89/workItem/System.Net.Http.Functional.Tests/wilogs currently show timeouts being hit on
- RedHat.74.Amd64.Open-x64:Release
- Centos.74.Amd64.Open-x64:Release
The failed runs all indicate that the xunit process made progress, then stopped printing to the console, and 20 minutes later the executor terminated the process. The logs show the automated reruns happened on different machines, and reproduced.
Snippets from one of the runs console.log:
2018-08-28 09:37:46,047: INFO: proc(54): run_and_log_output: Output: xUnit.net Console Runner v2.4.1-pre.build.4059 (64-bit .NET Core 4.6.26827.04)
2018-08-28 09:37:46,142: INFO: proc(54): run_and_log_output: Output: [37m Discovering: System.Net.Http.Functional.Tests
2018-08-28 09:37:46,402: INFO: proc(54): run_and_log_output: Output: [m[37m Discovered: System.Net.Http.Functional.Tests
2018-08-28 09:37:46,403: INFO: proc(54): run_and_log_output: Output: [m[37m Starting: System.Net.Http.Functional.Tests
2018-08-28 09:37:47,133: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClient_SelectedSites_Test.RetrieveSite_Succeeds [SKIP]
2018-08-28 09:37:47,133: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsSelectedSitesTestEnabled"
2018-08-28 09:37:47,133: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClient_SelectedSites_Test.RetrieveSite_Debug_Helper [SKIP]
2018-08-28 09:37:47,134: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsSelectedSitesTestEnabled"
2018-08-28 09:37:47,134: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClient_SelectedSites_Test.RetrieveSite_Succeeds [SKIP]
2018-08-28 09:37:47,134: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsSelectedSitesTestEnabled"
2018-08-28 09:37:47,134: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClient_SelectedSites_Test.RetrieveSite_Debug_Helper [SKIP]
2018-08-28 09:37:47,134: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsSelectedSitesTestEnabled"
2018-08-28 09:37:47,493: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.MakeAndFaultManyRequests [SKIP]
2018-08-28 09:37:47,493: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,494: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.UnreadResponseMessage_Collectible [SKIP]
2018-08-28 09:37:47,494: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,494: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.SingleClient_ManyGets_Async [SKIP]
2018-08-28 09:37:47,494: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,495: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.CreateAndDestroyManyClients [SKIP]
2018-08-28 09:37:47,495: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,495: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.ManyClients_ManyPosts_Async [SKIP]
2018-08-28 09:37:47,495: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,495: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.ManyClients_ManyGets [SKIP]
2018-08-28 09:37:47,496: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:47,496: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientMiniStress.SingleClient_ManyGets_Sync [SKIP]
2018-08-28 09:37:47,496: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:48,960: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientEKUTest.HttpClient_ClientEKUServerAuth_Fails [SKIP]
2018-08-28 09:37:48,960: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestCertificates"
2018-08-28 09:37:48,960: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientEKUTest.HttpClient_NoEKUServerAuth_Ok [SKIP]
2018-08-28 09:37:48,961: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestCertificates"
2018-08-28 09:37:48,961: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientEKUTest.HttpClient_NoEKUClientAuth_Ok [SKIP]
2018-08-28 09:37:48,961: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestClientCertificates"
2018-08-28 09:37:48,961: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientEKUTest.HttpClient_ServerEKUClientAuth_Fails [SKIP]
2018-08-28 09:37:48,961: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestClientCertificates"
2018-08-28 09:37:48,962: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientEKUTest.HttpClient_ServerEKUClientAuth_Fails [SKIP]
2018-08-28 09:37:48,962: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestClientCertificates"
2018-08-28 09:37:48,962: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientEKUTest.HttpClient_NoEKUClientAuth_Ok [SKIP]
2018-08-28 09:37:48,962: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestClientCertificates"
2018-08-28 09:37:48,962: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientEKUTest.HttpClient_NoEKUServerAuth_Ok [SKIP]
2018-08-28 09:37:48,963: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestCertificates"
2018-08-28 09:37:48,963: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientEKUTest.HttpClient_ClientEKUServerAuth_Fails [SKIP]
2018-08-28 09:37:48,963: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "CanTestCertificates"
2018-08-28 09:37:49,923: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.MakeAndFaultManyRequests [SKIP]
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.ManyClients_ManyPosts_Async [SKIP]
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.SingleClient_ManyGets_Sync [SKIP]
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,924: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.UnreadResponseMessage_Collectible [SKIP]
2018-08-28 09:37:49,925: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,925: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.ManyClients_ManyGets [SKIP]
2018-08-28 09:37:49,925: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,925: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.SingleClient_ManyGets_Async [SKIP]
2018-08-28 09:37:49,926: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:49,926: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientMiniStress.CreateAndDestroyManyClients [SKIP]
2018-08-28 09:37:49,926: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsStressModeEnabled"
2018-08-28 09:37:53,811: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientHandlerTest.Ctor_ExpectedDefaultPropertyValues_UapPlatform [SKIP]
2018-08-28 09:37:53,811: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsUap"
2018-08-28 09:37:53,925: INFO: proc(54): run_and_log_output: Output: [m[33;1m System.Net.Http.Functional.Tests.PlatformHandler_HttpClientHandlerTest.Ctor_ExpectedDefaultPropertyValues_UapPlatform [SKIP]
2018-08-28 09:37:53,925: INFO: proc(54): run_and_log_output: Output: [m[37m Condition(s) not met: "IsUap"
(log terminates)
And run_client.py:
2018-08-28 09:37:44,283: INFO: executor(112): run: Successfully downloaded work item payload
2018-08-28 09:37:44,283: INFO: executor(463): _execute_command: running chmod +x $HELIX_WORKITEM_PAYLOAD/*.sh && $HELIX_PYTHONPATH $HELIX_CORRELATION_PAYLOAD/RunnerScripts/scriptrunner/scriptrunner.py --script RunTests.sh $HELIX_CORRELATION_PAYLOAD in /home/helixbot/dotnetbuild/work/a01a5d92-9983-471e-a012-334328feca8f/Work/95b27b09-8f38-4fae-b2a7-a3e5f646d4b6/Exec max 1200 seconds
2018-08-28 09:38:28,804: INFO: servicebusrepository(86): renew_workitem_lock: Entering renew_workitem_lock for https://nethelix.servicebus.windows.net/centos.74.amd64.open/messages/89187500/6b012b36-ef7b-43c3-b6c1-1951676536ab
2018-08-28 09:38:28,914: INFO: saferequests(90): request_with_retry: Response complete with status code '200'
<snip />
2018-08-28 09:57:16,851: INFO: servicebusrepository(86): renew_workitem_lock: Entering renew_workitem_lock for https://nethelix.servicebus.windows.net/centos.74.amd64.open/messages/89187500/6b012b36-ef7b-43c3-b6c1-1951676536ab
2018-08-28 09:57:16,990: INFO: saferequests(90): request_with_retry: Response complete with status code '200'
2018-08-28 09:57:16,990: INFO: servicebusrepository(99): renew_workitem_lock: renew_workitem_lock Status Code: 200
2018-08-28 09:57:44,286: ERROR: job(44): kill: Job running for too long. Killing...
2018-08-28 09:57:44,301: ERROR: executor(474): _execute_command: Executor timed out after 1200 seconds and was killed.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 23 (23 by maintainers)
It seems like it is still problem. form CI Logs: [Long Running Test] ‘System.Net.Http.Functional.Tests.PlatformHandler_HttpClientHandler_Cancellation_Test.MaxConnectionsPerServer_WaitingConnectionsAreCancelable’, Elapsed: 00:17:56
I just checked in on the data this morning, and it looks like disabling the test worked. This weekend we saw less than ten test runs time out. That’s in contrast to almost a hundred last weekend.
I’m having a bit of trouble setting up a machine to repro the issue, but once I get that together I’ll try to fix the test itself.
@karelz I’ll disable that test now.
Yes, can easily reproduce it locally now. Hangs roughly 20% of the time, more if the tests run in parallel. Managed to track it to this specific line with some reasonable confidence (stuck on that line 10 out of 10 hangs):
https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Cancellation.cs#L340
We analyzed failure frequency for last month and something indeed changed:
Starting 8/28 we started getting slightly more failures on Debian.87, OpenSuse.423, Ubuntu.1404 and Ubuntu.1604 (2 failures of each on 8/28 and 1 failure of each on 8/29 - it may be related to specific PR?).[UPDATE] They are by-design, specific to PR dotnet/corefx#32006Note: These new failures are in different test categories: System.Net.Requests.Tests and System.Net.Security.TestsThanks @rmkerr and @wfurt for the data. @rmkerr can you please look at the 1st regression on Centos.74/RedHat.74/OSX.1013? That looks like something we should go after … We might want to check if there was any update to the OS images around 8/22.