runtime: [API Proposal + Bug] Make IListProvider public

Background and Motivation

I have some collections (specifically weak value collections) where it would be nice to have them implement ICollection<T>, but I am extremely hesitant to do that since weak collections are inherently “concurrent” (in the sense that values can be collected by the GC at any time) and thus the very commonly used ToArray() and ToList() enumerable extension methods have a high probability of returning a collection that is too large and contains some null values at the end if my collection implements ICollection<T> (see here for notes on a similar issue with ConcurrentDictionary). If I could implement IListProvider then I could provide a safe alternative that always returns the correct size array/list when the enumerable extension methods are called.

An additional benefit to making IListProvider public would come from being able to make collections like ConcurrentDictionary which live in a different assembly than IListProvider implement that interface so that the issue noted above could be mitigated and EnumerableExtensions.ToArray()/ToList() would be safe to call on it.

Update:

It appears that any LINQ methods that use Buffer (due to the call on this line) like OrderBy() are also currently unsafe to use on a ConcurrentDictionary and would become unsafe if the weak collections implemented ICollection<T>. That’s a really sneaky non-deterministic and very difficult to trace intermittent heisenbug that only shows up in high load situations waiting to happen. It also violates the principle of least surprise given that ConcurrentDictionary is safe to enumerate. I certainly did not know that this was an issue - I need to go back and review all my code now.

This proposal could fix that glaring problem.

Proposed API

Make IListProvider public.

Risks

None that I can think of.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 41 (30 by maintainers)

Most upvoted comments

Enumerating the dictionary is thread safe, thus calling LINQ methods on it should be thread safe. If LINQ didn’t use the Collection<T> optimization then it would be thread-safe to call LINQ methods on the ConcurrentDictionary. This is an internal implementation detail that breaks the contract provided by LINQ methods, which should be “if the object implements IEnumerable<T> in a safe manner then LINQ methods should operate against it safely”.

The IListProvider is already there, it would be incredibly simple to implement its 2 methods on ConcurrentDictionary and enable it to be thread safe, so its not “just how multithreading works”. Likewise, my collection could work safely with LINQ methods if that interface was exposed.

ConcurrentDictionary already has instance methods that do ToList and ToArray safely, it literally just needs to say that it implements that interface and it would be thread safe to call LINQ operations on it.

On my machine it eventually fails with ArgumentException: Destination array was not long enough. Check the destination index, length, and the array’s lower bounds. (Parameter ‘destinationArray’)

Yes, exactly.

You’re just not supposed to alter sequence while LINQ is performing a query on it.

ConcurrentDictionary specifically states it is safe to use the enumerator while reading and writing to it, and that can be properly achieved by the simple proposed fix above. I don’t get why you seem to actively want it to not be thread safe when it can be (and according to the docs should be under enumeration).

The issue isn’t with LINQ methods on IEnumerable being unsafe - LINQ is thread safe if the enumerator is thread safe. The issue is that it uses a non-thread-safe ICollection<T> optimization internally. It is simple enough to expose IListProvider so that concurrent collections can properly implement that in a thread-safe manner, as ConcurrentDictionary currently does with its ToList and ToArray methods, because IListProvider is used instead of the ICollection<T> optimization if the IEnumerable implements it.

There are literally no downsides to this.

I think I’m just going to go with adding IWeakCollection / IReadOnlyWeakCollection interfaces for now and the user can just ToList() the weak collection or whatnot if they need to pass it to something expecting an ICollection, probably safer that way anyway for interoperating with other code that depends on particular ICollection behavior.

I still think that ConcurrentDictionary with LINQ should be fixed though, it just seems like the only sensible thing to do given that it has a thread-safe enumerator. I’ll probably open up another issue for that with the ICollection<T>.ToArray() DIM idea as per your suggestion.

Because the enumerator in my collections do not return collected items, it skips over them. I know how the GC works 😋

Now that default interface methods are a thing, there could always be a ToArray() DIM added to ICollection<T> which defaults to the behavior LINQ does, but allows collections to override it. Then you wouldn’t need to special case collections and third-party collections can be happy too.

What is the other issue?

Well, the discussion also includes a different case than just ConcurrentDictionary, namely my collections which aren’t concurrent in the same way and I can’t just take locks to get around the issue like you can with a ConcurrentDictionary.

Regardless, it seems like an awful shame that LINQ can’t be used on CD when there are solutions that could fix this relatively easily. It’s rather odd to have a collection that can be foreached safely but using LINQ can make it spontaneously explode because of leaky internal implementation details.

If a type says it is safe to enumerate over and fulfills the IEnumerable<T> contract, LINQ methods need to be safe to call on it. Any “hidden detail” internal optimizations that LINQ methods do above calling IEnumerable<T> members should not create bugs, especially ones this difficult to trace. There needs to be a mechanism that collection types can use like the IListProvider interface that provides a way for those optimizations to work safely or at least some attribute like [NoLinqCollectionOptimizations] to disable those optimizations…anything less is pure insanity IMO. I would obviously prefer the former and given that it is already there and just needs to be made public that seems like the easier solution.

The fact that MS provided collections that say they are safe to enumerate over while concurrently reading/writing break when using LINQ operations and this isn’t heavily documented is troubling, to say the least.