runtime: Immutable* classes do not perform well
Looking through the source for ImmutableArray<T>
it seems it is more of a read-only array rather than a proper immutable class.
The difference being that read-only is designed to remain static and unchanged, whereas immutable is designed to be unchanged but with the ability to very quickly make a copy of the object with slight variations.
Immutable<T>
array copies every element on change, which doesn’t perform well at all.
A simple benchmark comparing ImmutableArray<T>
to the Lst
class in LanguageExt
, and to a simple (incomplete) class I wrote that stores the elements in a tree shows the inefficiency of the class. This benchmark was based on adding 100_000 integers.
Could the framework use some established “Persistent data structure” algorithms instead please?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 44 (28 by maintainers)
No, this is very specifically not what
ImmutableArray
is designed for. That type is designed for scenarios where data is built up once with a builder then frozen, never to be added to again. The immutable types were first prototyped by the Roslyn compiler, which uses this pattern all over the place to build up semantic info then share it with confidence across multiple threads. The performance of read operations is much, much more important for this type than the performance of add operations. If you care about add operations and are willing to sacrifice read, use a different data structure, such asImmutableList
, which has a different set of priorities.ImmutableArray<T>
is more like a workaround of type system. It’s a slim wrapper of array that disallows writing. It’s performance goal is to be as fast as array for read operations.This would require fundamental change of data structure.
ImmutableList<T>
is more friendly for such usage.https://docs.microsoft.com/en-us/dotnet/api/system.collections.immutable.immutablearray-1
Almost O(1) is not O(1). As I said, read is the most important attribute of ImmutableArray. Any reduction from O(1) to O(log n) or some other non-constant will have significant detrimental impacts to the entire ecosystem, as it will significantly hurt the performance of the compiler, and by extension, every IDE or piece of tooling that uses it.
@theodorzoulias Yes, it is preferable. I am not disputing that. I am now utterly convinced there is a use for ImmutbleArray in its current form.
I don’t think so. It is fine when the collection is expected to be very small. This is not a rare corner case.
Naming
ImmutableArray.Add
andImmutableList.Add
the same has an advantage, though. It makes switching fromImmutableList
toImmutableArray
, or vice versa, easier. If one has been usingImmutableList
but then figures out thatImmutableArray
would be better in their situation, they only need to changeImmutableList
toImmutableArray
without needing to changeAdd
toCloneAndAdd
. One thing I don’t like is thatArray.Length
andList<T>.Count
have different names.I don’t see how this section specifies that immutability requires persistent data structures. It talks about how they can be used for immutable data structures, but doesn’t mention anything about the requirements.
Performance cuts both ways, right? I might expect that
ImmutableList
’s read performance is highly performant, but it isn’t. It’sO(log n)
. That isn’t acceptable for my use case: I very rarely add, but I read all the time. Having a data structure that is optimized for that scenario means that it’s perfectly fine to have anO(n)
add method as long as I getO(1)
read performance. As long as the performance of these methods is documented (which they are), it seems perfectly reasonable to me. Every data structure has its tradeoffs, and when selecting a data structure for an application it’s important to fully research the performance characteristics of all potential structures and understand them before committing. For example, for the required members feature I very specifically looked for a data structure that uses persistent data, as I wanted to build up immutable dictionaries across type hierarchies, and memory space efficiency is important to me in this scenario.The title of https://en.wikipedia.org/wiki/Persistent_data_structure is “Persistent data structure”, not “Immutable data structure”.
ImmutableArray.Add
method being O(n) is not terribly bad. It is still acceptable or even preferred in some cases for example when the array contains few elements. Even insertion sort has its place.The naming of
Immutable*
or the inclusion of derivative methods might be confusing, but the names consist of only a few words and cannot convey the details anyway. That is what documentations for. Besides, the nameImmutableArray
already hints that it may be backed by a plain .net array, and the coexistence of bothImmutableArray
andImmutableList
already hints that one may need to do some research (in particular read the documentations) before deciding whether to useImmutableArray
orImmutableList
. In addition,ImmutableArray<T>
is a struct, which also hints that this type may be somehow different.You could propose a change to make it less confusing, but obsoleting
ImmutableArray.Add
is not a solution.ImmutableArray.Add
is fine, as I said.Your comment was interesting and informative, thank you.
My comment was about how immutable algorithms (aka persistent data structures) were designed, which the .net class is following in name only. Roslyn could have used a ReadOnly collection if no derivatives were required.
I will try out ImmutableList to see how the performance compares. Although it makes little sense if one Immutable* class satisfies industry required standards and another does not.
PS: In case you’d like to consider changing it for future readers, your opening sentence can be misread as being a bit confrontational and condescending.