runtime: System.IO.Pipelines.Pipe.ReadAsync fails to cancel on CancellationToken

Description

I have a hang dump of a process that is stuck in a PipeReader.ReadAsync call that was canceled via CancellationToken but which never canceled the returned ValueTask<ReadResult>. This resulted in the hang.

I do not have a reliable repro of it but I have a rash of pipelines and repos that all suffer from frequent hangs in test runs and ultimately they all lead into this same function so I believe they are all caused by this one problem.

Dump file. Failed pipeline.

Looking at the dump file in WinDbg I took these notes, which form a kind of continuation chain mixed with some state in each of the objects. Below you can see that the CancellationToken that was passed to PipeReader.ReadAsync is in fact cancelled, but that the Pipe._readerAwaitable field has no CancellationTokenRegistration for it. You can also see that ReadAsync has not completed, so we’re blocked waiting for ReadAsync to honor the CancellationToken yet it never will because it wasn’t even registered to notice.

0000020b0e0e2048 Pipe
  0000020b0e0e20e8 _readerAwaitable
    10 _awaitableState
    0000020b0e0e2110 _cancellationTokenRegistration
      0 _id
      0 _node
0000020b0e0e2278 DefaultPipeReader.ReadAsync
  0000020b0e0e2048 _pipe
0000020b0dff7d38 System.IO.Pipelines.PipeReaderStream+<ReadAsyncInternal
  0000020b0dfdeeb8 cancellationToken.m_source (_state: 3, cancelled)
  0000020b0e0e2278 _pipeReader
0000020b0dff7df8 MessagePackStreamReader.TryReadMoreDataAsync
  0000020b0dfdeeb8 cancellationToken.m_source (_state: 3, cancelled)
  0000020b0dfde1a8 underlying System.IO.Pipelines.PipeReaderStream 
0000020b0dfde5b8 V2Formatter

Configuration

Windows 10 The hangs show up on .NET Framework and .NET Core 3.1 at least. x64 process System.IO.Pipelines v5.00.120.57516 (nuget package version v5.0.1)

Regression?

Unknown.

Other information

This bug was originally filed against Nerdbank.Streams as https://github.com/AArnott/Nerdbank.Streams/issues/387 before I tracked it into System.IO.Pipelines.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (25 by maintainers)

Most upvoted comments

Ok, I found a cancellation bug in Nerdbank.Streams this time around (AArnott/Nerdbank.Streams#414). So till evidence suggests otherwise, I will consider this bug fixed with the original PR. @davidfowl, I will do some more testing of the Pipelines fix combined with the Nerdbank.Streams fix. If I don’t see any hangs in test runs, I’ll request that we resume the backport of the fix made for this bug.

This looked interesting, so I did a little digging and here is what I found:

https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L46-L60

If the cancellation token gets cancelled just before executing line 58 here: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L58

then the callback will be invoked synchronously. The callback is actually: https://github.com/dotnet/runtime/blob/033b2ac49a3f473a790da21dd67cbea02ad93c8b/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/Pipe.cs#L1043-L1051

which calls PipeAwaitable https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L139-L150

However if we read carefully in CancellationTokenFired we find that Cancel(out completionData) is only called if CancellationToken.IsCancellationRequested. If we go to the definition of CancellationToken in PipeAwaitable it is defined as: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L21-L23

It uses the _cancellationTokenRegistration field to extract the cancellation token. However if we go back to the beginning, we will see that the _cancellationTokenRegistration is not yet set: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs#L58

The summary is that UnsafeRegister method invokes the callback synchronously, but the callback uses the result from UnsafeRegister. Writing a test for this particular case would be extremely difficult in my opinion.

//cc @davidfowl

Thank you, @BrennanConroy. I’ll take another look at the dump with that insight.

I can confirm from the dump that the System.IO.Pipelines.dll loaded in the process is 5.0.1+b02e13abab3d6f499751af2ad1ae6333fb4e633d.

I’m now not sure how this happens on .NET Framework since the cancellation token is assigned before registering for the callback

The dump file isn’t of the .NET Framework. It’s a .NET Core process (coreclr.dll 4,700,21,35901 @Commit: 2aee3eb10b6303dff09cbe5ac99b6fae4644f346).

If the bug you found that can be blamed for the cause of the hang in this dump can’t explain a failure on .NET Framework then I will find a .NET Framework dump as well that we can investigate.