runtime: [API Proposal + Bug] Make IListProvider public
Background and Motivation
I have some collections (specifically weak value collections) where it would be nice to have them implement ICollection<T>
, but I am extremely hesitant to do that since weak collections are inherently “concurrent” (in the sense that values can be collected by the GC at any time) and thus the very commonly used ToArray()
and ToList()
enumerable extension methods have a high probability of returning a collection that is too large and contains some null
values at the end if my collection implements ICollection<T>
(see here for notes on a similar issue with ConcurrentDictionary
). If I could implement IListProvider
then I could provide a safe alternative that always returns the correct size array/list when the enumerable extension methods are called.
An additional benefit to making IListProvider
public would come from being able to make collections like ConcurrentDictionary
which live in a different assembly than IListProvider
implement that interface so that the issue noted above could be mitigated and EnumerableExtensions.ToArray()/ToList()
would be safe to call on it.
Update:
It appears that any LINQ methods that use Buffer
(due to the call on this line) like OrderBy()
are also currently unsafe to use on a ConcurrentDictionary
and would become unsafe if the weak collections implemented ICollection<T>
. That’s a really sneaky non-deterministic and very difficult to trace intermittent heisenbug that only shows up in high load situations waiting to happen. It also violates the principle of least surprise given that ConcurrentDictionary
is safe to enumerate. I certainly did not know that this was an issue - I need to go back and review all my code now.
This proposal could fix that glaring problem.
Proposed API
Make IListProvider
public.
Risks
None that I can think of.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 41 (30 by maintainers)
Enumerating the dictionary is thread safe, thus calling LINQ methods on it should be thread safe. If LINQ didn’t use the
Collection<T>
optimization then it would be thread-safe to call LINQ methods on theConcurrentDictionary
. This is an internal implementation detail that breaks the contract provided by LINQ methods, which should be “if the object implements IEnumerable<T> in a safe manner then LINQ methods should operate against it safely”.The
IListProvider
is already there, it would be incredibly simple to implement its 2 methods onConcurrentDictionary
and enable it to be thread safe, so its not “just how multithreading works”. Likewise, my collection could work safely with LINQ methods if that interface was exposed.ConcurrentDictionary already has instance methods that do
ToList
andToArray
safely, it literally just needs to say that it implements that interface and it would be thread safe to call LINQ operations on it.Yes, exactly.
ConcurrentDictionary
specifically states it is safe to use the enumerator while reading and writing to it, and that can be properly achieved by the simple proposed fix above. I don’t get why you seem to actively want it to not be thread safe when it can be (and according to the docs should be under enumeration).The issue isn’t with LINQ methods on IEnumerable being unsafe - LINQ is thread safe if the enumerator is thread safe. The issue is that it uses a non-thread-safe
ICollection<T>
optimization internally. It is simple enough to exposeIListProvider
so that concurrent collections can properly implement that in a thread-safe manner, asConcurrentDictionary
currently does with itsToList
andToArray
methods, becauseIListProvider
is used instead of theICollection<T>
optimization if the IEnumerable implements it.There are literally no downsides to this.
I think I’m just going to go with adding
IWeakCollection
/IReadOnlyWeakCollection
interfaces for now and the user can justToList()
the weak collection or whatnot if they need to pass it to something expecting anICollection
, probably safer that way anyway for interoperating with other code that depends on particularICollection
behavior.I still think that
ConcurrentDictionary
with LINQ should be fixed though, it just seems like the only sensible thing to do given that it has a thread-safe enumerator. I’ll probably open up another issue for that with theICollection<T>.ToArray()
DIM idea as per your suggestion.Because the enumerator in my collections do not return collected items, it skips over them. I know how the GC works 😋
Now that default interface methods are a thing, there could always be a
ToArray()
DIM added toICollection<T>
which defaults to the behavior LINQ does, but allows collections to override it. Then you wouldn’t need to special case collections and third-party collections can be happy too.Well, the discussion also includes a different case than just
ConcurrentDictionary
, namely my collections which aren’t concurrent in the same way and I can’t just take locks to get around the issue like you can with aConcurrentDictionary
.Regardless, it seems like an awful shame that LINQ can’t be used on CD when there are solutions that could fix this relatively easily. It’s rather odd to have a collection that can be
foreach
ed safely but using LINQ can make it spontaneously explode because of leaky internal implementation details.If a type says it is safe to enumerate over and fulfills the
IEnumerable<T>
contract, LINQ methods need to be safe to call on it. Any “hidden detail” internal optimizations that LINQ methods do above callingIEnumerable<T>
members should not create bugs, especially ones this difficult to trace. There needs to be a mechanism that collection types can use like theIListProvider
interface that provides a way for those optimizations to work safely or at least some attribute like[NoLinqCollectionOptimizations]
to disable those optimizations…anything less is pure insanity IMO. I would obviously prefer the former and given that it is already there and just needs to be made public that seems like the easier solution.The fact that MS provided collections that say they are safe to enumerate over while concurrently reading/writing break when using LINQ operations and this isn’t heavily documented is troubling, to say the least.