runtime: BlockingCollection.TryTakeFromAny throws InvalidOperationException when underlying collection is ConcurrentBag

When the underlying collection for BlockingCollection<T> is ConcurrentBag<T>, concurrent calls to the static BlockingCollection<T>.TryTakeFromAny method can sometimes throw InvalidOperationException with the message “The underlying collection was modified from outside of the BlockingCollection<T>”. This can occur without any external modification to the collection.

This behavior is present in .NET Core 2.0/2.1 but not in .NET Framework 4.6.1.

Repro: https://gist.github.com/ReubenBond/98de2cede0d57a989ededa8e113b0f39#file-blockingcollection_concurrentbag_issue-cs

EDIT: This can be reproduced without TryTakeFromAny by replacing that line in the repro with success = blockingCollection.TryTake(out _).

EDIT 2: This does not reproduce with an underlying collection of type ConcurrentQueue<T>

EDIT 3: Updated repro to use ThreadPool instead of tasks - it reproduces much more frequently now.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 17 (17 by maintainers)

Most upvoted comments

I believe the problem is that when we rewrote ConcurrentBag for .NET Core to significantly improve its performance, as one small part of that we effectively removed this check: https://referencesource.microsoft.com/#System/sys/system/collections/concurrent/ConcurrentBag.cs,397

The problem with that, as this repro highlights, is that if multiple threads are taking/stealing, it’s possible that a thread may miss an item if it’s taken by another thread. Consider a situation with four threads each with their own local queue, all of which are currently empty. Then consider this ordering of operations:

Thread 2 adds an item, incrementing the BlockingCollection’s semaphore to 1.
Thread 4 tries to take an item; there is one, so it decrements the semaphore’s count to 0, finds its local queue empty, and starts searching for an item to steal. It looks at thread 1 and finds its local queue empty.
Thread 1 adds an item, incrementing the BC’s semaphore back to 1. Thread 4 just checked Thread 1’s queue, so it’s not going to check it again.
Thread 2 takes an item, decrementing the semaphore back to 0. It checks it own queue and finds that it contains an item. Success.
Thread 4 continues its search: it finds thread 2’s list is empty and then finds thread 3’s list is empty. It’s now looked at all of the lists, and returns false from TryTake, even though thread 1’s list contains an item and it successfully decremented the semaphore’s count. BlockingCollection throws.

So, even though there was an item in the collection that could have been taken, it missed it.

This sequence highlights why “Updated repro to use ThreadPool instead of tasks - it reproduces much more frequently now” made a difference: ThreadPool.QueueUserWorkItem(callback) puts work items into the global queue, whereas Task.Run from a thread pool thread puts the task into the thread’s queue… that means the thread that’s doing the add is very likely to keep doing adds rather than takes, which means it’ll be much less likely to get into a situation like with steps (1) and (4) in the above sequence, where the same thread needed to add then take.

Unfortunately I think we’re going to need to put back some kind of versioning check, where steals that fail check the versions, and if anything’s been added since, it tries again.

cc: @kouvel, @benaadams

stephentoub on Jul 2, 2018