runtime: Test failure System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments
Run: runtime 20210428.85
Failed test:
net6.0-Linux-Release-arm-CoreCLR_checked-(Alpine.313.Arm32.Open)Ubuntu.1804.ArmArch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.13-helix-arm32v7-20210414141857-1ea6b0a
-System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments(windowBits: 15)
Error message:
System.Threading.Tasks.TaskCanceledException : A task was canceled.
Stack trace
at System.Net.WebSockets.ManagedWebSocket.SendFrameFallbackAsync(MessageOpcode opcode, Boolean endOfMessage, Boolean disableCompression, ReadOnlyMemory`1 payloadBuffer, CancellationToken cancellationToken) in /_/src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/ManagedWebSocket.cs:line 564
at System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments(Int32 windowBits) in /_/src/libraries/System.Net.WebSockets/tests/WebSocketDeflateTests.cs:line 444
--- End of stack trace from previous location ---
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 40 (40 by maintainers)
I’ve found the difference between what I was running on my machine and what was failing in CI. I was running on Release runtime, but the test actually ran overtime on Checked runtime. Here on Release runtime, the test takes the same 0.3s I was seeing
And here on Checked runtime, it takes more than 5s
The pipeline with these results is here.
I will build Checked runtime on my machine to confirm I have the repro and to see what exactly takes the time.
@BruceForstall this was done in #52086
I agree (this is a reminder to us all 🙂) that when a test fails regularly in CI we should disable it immediately if we can’t fix it immediately. We’re good engineers and prefer to investigate and fix rather than disable anything, but that can happen while it’s disabled. 🙂
I’ve run the measurements multiple times on Checked runtime and it was indeed Random taking all the time there: random part takes
4.8sand deflate part takes0.3s. When I apply fix from https://github.com/dotnet/runtime/pull/52052, random part reduces to0.02s😲. @danmoseley should we pass a word about it to people owning Random? I will reopen @zlatanov’s PR as the fix is working, and I will double-check that in CI 😊I was racking my brain to think what we were missing, and you figured it out. 👏
You can run it manually from your branch. Just click on “Run Pipeline” here: https://dev.azure.com/dnceng/public/_build?definitionId=686
Then, on the source branch choose your branch… you can push a branch to the dotnet fork of dotnet/runtime and then it will show up as available to run the pipeline from; or if you have a PR you can use
refs/pull/<PRid>.I spoke imprecisely, I meant when it failed, it was always 14 or 15, try this query:
Execute: Web | Desktop | Web (Lens) | Desktop (SAW)
https://engsrvprod.kusto.windows.net/engineeringdata
Got it. Interesting. Apologies for crossing questions on the PR as well.
@CarnaViire I created a PR, we wait now to see the run times for the test. If I am correct, the fix should be successful. If however you feel that we should remove the test, let me know.
The deflate only describes the structure/format of the data. The algorithm is implementation detail that might change (for example zlib-intel vs classic zlib), memory constraints, different performance optimizations that might trade compression ratio for speed.