runtime: System.IO.Pipelines.Pipe.ReadAsync fails to cancel on CancellationToken
Description
I have a hang dump of a process that is stuck in a PipeReader.ReadAsync call that was canceled via CancellationToken but which never canceled the returned ValueTask<ReadResult>. This resulted in the hang.
I do not have a reliable repro of it but I have a rash of pipelines and repos that all suffer from frequent hangs in test runs and ultimately they all lead into this same function so I believe they are all caused by this one problem.
Looking at the dump file in WinDbg I took these notes, which form a kind of continuation chain mixed with some state in each of the objects. Below you can see that the CancellationToken that was passed to PipeReader.ReadAsync is in fact cancelled, but that the Pipe._readerAwaitable field has no CancellationTokenRegistration for it. You can also see that ReadAsync has not completed, so we’re blocked waiting for ReadAsync to honor the CancellationToken yet it never will because it wasn’t even registered to notice.
0000020b0e0e2048 Pipe
0000020b0e0e20e8 _readerAwaitable
10 _awaitableState
0000020b0e0e2110 _cancellationTokenRegistration
0 _id
0 _node
0000020b0e0e2278 DefaultPipeReader.ReadAsync
0000020b0e0e2048 _pipe
0000020b0dff7d38 System.IO.Pipelines.PipeReaderStream+<ReadAsyncInternal
0000020b0dfdeeb8 cancellationToken.m_source (_state: 3, cancelled)
0000020b0e0e2278 _pipeReader
0000020b0dff7df8 MessagePackStreamReader.TryReadMoreDataAsync
0000020b0dfdeeb8 cancellationToken.m_source (_state: 3, cancelled)
0000020b0dfde1a8 underlying System.IO.Pipelines.PipeReaderStream
0000020b0dfde5b8 V2Formatter
Configuration
Windows 10 The hangs show up on .NET Framework and .NET Core 3.1 at least. x64 process System.IO.Pipelines v5.00.120.57516 (nuget package version v5.0.1)
Regression?
Unknown.
Other information
This bug was originally filed against Nerdbank.Streams as https://github.com/AArnott/Nerdbank.Streams/issues/387 before I tracked it into System.IO.Pipelines.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (25 by maintainers)
Ok, I found a cancellation bug in Nerdbank.Streams this time around (AArnott/Nerdbank.Streams#414). So till evidence suggests otherwise, I will consider this bug fixed with the original PR. @davidfowl, I will do some more testing of the Pipelines fix combined with the Nerdbank.Streams fix. If I don’t see any hangs in test runs, I’ll request that we resume the backport of the fix made for this bug.
This looked interesting, so I did a little digging and here is what I found:
https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L46-L60
If the cancellation token gets cancelled just before executing line
58here: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L58then the callback will be invoked synchronously. The callback is actually: https://github.com/dotnet/runtime/blob/033b2ac49a3f473a790da21dd67cbea02ad93c8b/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/Pipe.cs#L1043-L1051
which calls
PipeAwaitablehttps://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L139-L150However if we read carefully in
CancellationTokenFiredwe find thatCancel(out completionData)is only called ifCancellationToken.IsCancellationRequested. If we go to the definition ofCancellationTokeninPipeAwaitableit is defined as: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L21-L23It uses the
_cancellationTokenRegistrationfield to extract the cancellation token. However if we go back to the beginning, we will see that the_cancellationTokenRegistrationis not yet set: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L58The summary is that
UnsafeRegistermethod invokes the callback synchronously, but the callback uses the result fromUnsafeRegister. Writing a test for this particular case would be extremely difficult in my opinion.//cc @davidfowl
Thank you, @BrennanConroy. I’ll take another look at the dump with that insight.
I can confirm from the dump that the System.IO.Pipelines.dll loaded in the process is 5.0.1+b02e13abab3d6f499751af2ad1ae6333fb4e633d.
The dump file isn’t of the .NET Framework. It’s a .NET Core process (coreclr.dll 4,700,21,35901 @Commit: 2aee3eb10b6303dff09cbe5ac99b6fae4644f346).
If the bug you found that can be blamed for the cause of the hang in this dump can’t explain a failure on .NET Framework then I will find a .NET Framework dump as well that we can investigate.