runtime: Test failure System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments

Run: runtime 20210428.85

Failed test:

net6.0-Linux-Release-arm-CoreCLR_checked-(Alpine.313.Arm32.Open)Ubuntu.1804.ArmArch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.13-helix-arm32v7-20210414141857-1ea6b0a
 -System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments(windowBits: 15)

Error message:

System.Threading.Tasks.TaskCanceledException : A task was canceled.


Stack trace
   at System.Net.WebSockets.ManagedWebSocket.SendFrameFallbackAsync(MessageOpcode opcode, Boolean endOfMessage, Boolean disableCompression, ReadOnlyMemory`1 payloadBuffer, CancellationToken cancellationToken) in /_/src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/ManagedWebSocket.cs:line 564
   at System.Net.WebSockets.Tests.WebSocketDeflateTests.PayloadShouldHaveSimilarSizeWhenSplitIntoSegments(Int32 windowBits) in /_/src/libraries/System.Net.WebSockets/tests/WebSocketDeflateTests.cs:line 444
--- End of stack trace from previous location ---

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 40 (40 by maintainers)

Commits related to this issue

Fix for failing WebSocket deflate test on ARM (#52052) Reducing the number of times Random.Next is called to improve runtime performance of test on ARM. Fixes #52031 — committed to dotnet/runtime by zlatanov 3 years ago

Most upvoted comments

I’ve found the difference between what I was running on my machine and what was failing in CI. I was running on Release runtime, but the test actually ran overtime on Checked runtime. Here on Release runtime, the test takes the same 0.3s I was seeing And here on Checked runtime, it takes more than 5s The pipeline with these results is here. I will build Checked runtime on my machine to confirm I have the repro and to see what exactly takes the time.

CarnaViire on May 10, 2021

@BruceForstall this was done in #52086

I agree (this is a reminder to us all 🙂) that when a test fails regularly in CI we should disable it immediately if we can’t fix it immediately. We’re good engineers and prefer to investigate and fix rather than disable anything, but that can happen while it’s disabled. 🙂

danmoseley on Apr 30, 2021

I’ve run the measurements multiple times on Checked runtime and it was indeed Random taking all the time there: random part takes 4.8s and deflate part takes 0.3s. When I apply fix from https://github.com/dotnet/runtime/pull/52052, random part reduces to 0.02s 😲. @danmoseley should we pass a word about it to people owning Random? I will reopen @zlatanov’s PR as the fix is working, and I will double-check that in CI 😊

CarnaViire on May 11, 2021

I was racking my brain to think what we were missing, and you figured it out. 👏

danmoseley on May 10, 2021

and I don’t know how to trigger it

You can run it manually from your branch. Just click on “Run Pipeline” here: https://dev.azure.com/dnceng/public/_build?definitionId=686

Then, on the source branch choose your branch… you can push a branch to the dotnet fork of dotnet/runtime and then it will show up as available to run the pipeline from; or if you have a PR you can use refs/pull/<PRid>.

safern on May 10, 2021

@danmoseley what makes you say it was always failing?

I spoke imprecisely, I meant when it failed, it was always 14 or 15, try this query:

Execute: Web | Desktop | Web (Lens) | Desktop (SAW)

https://engsrvprod.kusto.windows.net/engineeringdata

TestResults 
| join kind=inner WorkItems on WorkItemId 
| join kind=inner Jobs on JobId
| where Finished >= datetime(2021-3-1 0:00:00)
and Type == "System.Net.WebSockets.Tests.WebSocketDeflateTests"
and Branch == "refs/heads/main"
| summarize count() by Method, Result, QueueAlias, Arguments, Message

danmoseley on May 7, 2021

Got it. Interesting. Apologies for crossing questions on the PR as well.

danmoseley on Apr 30, 2021

@CarnaViire I created a PR, we wait now to see the run times for the test. If I am correct, the fix should be successful. If however you feel that we should remove the test, let me know.

zlatanov on Apr 29, 2021

Btw, how could the algorithm change that much so the aggregation of compressed parts of the message will be much bigger/smaller than the compressed whole message?.. wouldn’t it be a different/new algo then, not DEFLATE as it was described in RFC?

The deflate only describes the structure/format of the data. The algorithm is implementation detail that might change (for example zlib-intel vs classic zlib), memory constraints, different performance optimizations that might trade compression ratio for speed.

zlatanov on Apr 29, 2021