runtime: BufferBlock.Completion never completes in specific scenario

Hi! While I was writing some library code, I created a small TPL Dataflow pipeline consisting of two blocks, with the completion of the first block not propagated properly to the second block. Here is a minimal example that reproduces this strange behavior:

using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

static class Program
{
    static async Task Main()
    {
        var block1 = new BufferBlock<int>();
        var block2 = new BufferBlock<int>();
        block1.LinkTo(block2, new() { PropagateCompletion = true });
        block1.Post(1);
        block1.Complete();
        await block1.Completion;
        block2.TryReceiveAll(out var items);
        bool completed = block2.Completion.Wait(500);
        Console.WriteLine($"block2 completed in time: {completed}");
    }
}

Output:

block2 completed in time: False

Try it on Fiddle.

The expected behavior would be for the block2 to complete immediately, since the block1 has already completed, the two blocks are linked together with the PropagateCompletion = true option, and the block2 has emitted all the messages it contains. However the block2 never completes in this scenario. Calling block2.Completion.Wait() blocks indefinitely.

Switching from BufferBlock to TransformBlock for any of the two blocks makes no difference, the issue remains.

There are several subtle changes that prevent this behavior from happening.

  1. Adding a Thread.Sleep(100) after the block1.Complete() solves the problem.
  2. Waiting the block1 synchronously (block1.Completion.Wait()) also solves the problem.
  3. Waiting the block2 asynchronously (await block2.Completion) solves the problem as well.
  4. Completing the block2 manually (block2.Complete()) before waiting its completion, also fixes the problem.

My guess is that some sort of race condition is taking place in this specific scenario.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@mayorovp and @pedoc are right: this is a deadlock caused by combining await with synchronous Wait().

Specifically, it happens because of this part in the TPL Dataflow code:

https://github.com/dotnet/runtime/blob/6527f540e4b50bc84eb72705f80d3f2bdd57473b/src/libraries/System.Threading.Tasks.Dataflow/src/Internal/SourceCore.cs#L962-L968

The problem is that _completionTask.TrySetResult executes its continuations synchronously, which means it directly invokes the part of Main after await block1.Completion;. But that blocks waiting for block2.Completion, which means _targetRegistry.PropagateCompletion() is never called, which means block2.Completion is not completed, leading to a deadlock.

One resolution would be to say that it’s the user’s fault for combining async and sync code in this way and close this issue. Another possible resolution would be to use RunContinuationsAsynchronously on the _completionTask, which sacrifices some performance to prevent this deadlock. I think this is the way to go, so I have opened https://github.com/dotnet/runtime/pull/61140.