runtime: Immutable* classes do not perform well

Looking through the source for ImmutableArray<T> it seems it is more of a read-only array rather than a proper immutable class.

The difference being that read-only is designed to remain static and unchanged, whereas immutable is designed to be unchanged but with the ability to very quickly make a copy of the object with slight variations.

Immutable<T> array copies every element on change, which doesn’t perform well at all.

A simple benchmark comparing ImmutableArray<T> to the Lst class in LanguageExt, and to a simple (incomplete) class I wrote that stores the elements in a tree shows the inefficiency of the class. This benchmark was based on adding 100_000 integers.

image

Could the framework use some established “Persistent data structure” algorithms instead please?

https://en.wikipedia.org/wiki/Persistent_data_structure

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 44 (28 by maintainers)

Most upvoted comments

whereas immutable is designed to be unchanged but with the ability to very quickly make a copy of the object with slight variations.

No, this is very specifically not what ImmutableArray is designed for. That type is designed for scenarios where data is built up once with a builder then frozen, never to be added to again. The immutable types were first prototyped by the Roslyn compiler, which uses this pattern all over the place to build up semantic info then share it with confidence across multiple threads. The performance of read operations is much, much more important for this type than the performance of add operations. If you care about add operations and are willing to sacrifice read, use a different data structure, such as ImmutableList, which has a different set of priorities.

ImmutableArray<T> is more like a workaround of type system. It’s a slim wrapper of array that disallows writing. It’s performance goal is to be as fast as array for read operations.

whereas immutable is designed to be unchanged but with the ability to very quickly make a copy of the object with slight variations.

This would require fundamental change of data structure. ImmutableList<T> is more friendly for such usage.

There is nothing in the docs to indicate this is not a class that implements high performance immutable principles

https://docs.microsoft.com/en-us/dotnet/api/system.collections.immutable.immutablearray-1

There are different scenarios best for ImmutableArray<T> and others best for ImmutableList<T>.

Reasons to use immutable array: Updating the data is rare or the number of elements is quite small (less than 16 items) You need to be able to iterate over the data in performance critical sections You have many instances of immutable collections and you can’t afford keeping the data in trees

Reasons to use immutable list: Updating the data is common or the number of elements isn’t expected to be small Updating the collection is more performance critical than iterating the contents

The following table summarizes the performance characteristics of ImmutableArray<T>

Almost O(1) is not O(1). As I said, read is the most important attribute of ImmutableArray. Any reduction from O(1) to O(log n) or some other non-constant will have significant detrimental impacts to the entire ecosystem, as it will significantly hurt the performance of the compiler, and by extension, every IDE or piece of tooling that uses it.

@theodorzoulias Yes, it is preferable. I am not disputing that. I am now utterly convinced there is a use for ImmutbleArray in its current form.

ImmutableArray should nearly always be avoided when you need derivates of the value.

I don’t think so. It is fine when the collection is expected to be very small. This is not a rare corner case.

Naming ImmutableArray.Add and ImmutableList.Add the same has an advantage, though. It makes switching from ImmutableList to ImmutableArray, or vice versa, easier. If one has been using ImmutableList but then figures out that ImmutableArray would be better in their situation, they only need to change ImmutableList to ImmutableArray without needing to change Add to CloneAndAdd. One thing I don’t like is that Array.Length and List<T>.Count have different names.

It’s further down the page = https://en.wikipedia.org/wiki/Persistent_data_structure#Applications_of_persistent_data_structures

I don’t see how this section specifies that immutability requires persistent data structures. It talks about how they can be used for immutable data structures, but doesn’t mention anything about the requirements.

it is absolutely right that the consumer should assume those methods are highly performant using well established algorithms.

Performance cuts both ways, right? I might expect that ImmutableList’s read performance is highly performant, but it isn’t. It’s O(log n). That isn’t acceptable for my use case: I very rarely add, but I read all the time. Having a data structure that is optimized for that scenario means that it’s perfectly fine to have an O(n) add method as long as I get O(1) read performance. As long as the performance of these methods is documented (which they are), it seems perfectly reasonable to me. Every data structure has its tradeoffs, and when selecting a data structure for an application it’s important to fully research the performance characteristics of all potential structures and understand them before committing. For example, for the required members feature I very specifically looked for a data structure that uses persistent data, as I wanted to build up immutable dictionaries across type hierarchies, and memory space efficiency is important to me in this scenario.

The title of https://en.wikipedia.org/wiki/Persistent_data_structure is “Persistent data structure”, not “Immutable data structure”.

ImmutableArray.Add method being O(n) is not terribly bad. It is still acceptable or even preferred in some cases for example when the array contains few elements. Even insertion sort has its place.

The naming of Immutable* or the inclusion of derivative methods might be confusing, but the names consist of only a few words and cannot convey the details anyway. That is what documentations for. Besides, the name ImmutableArray already hints that it may be backed by a plain .net array, and the coexistence of both ImmutableArray and ImmutableList already hints that one may need to do some research (in particular read the documentations) before deciding whether to use ImmutableArray or ImmutableList. In addition, ImmutableArray<T> is a struct, which also hints that this type may be somehow different.

You could propose a change to make it less confusing, but obsoleting ImmutableArray.Add is not a solution. ImmutableArray.Add is fine, as I said.

Your comment was interesting and informative, thank you.

My comment was about how immutable algorithms (aka persistent data structures) were designed, which the .net class is following in name only. Roslyn could have used a ReadOnly collection if no derivatives were required.

I will try out ImmutableList to see how the performance compares. Although it makes little sense if one Immutable* class satisfies industry required standards and another does not.

PS: In case you’d like to consider changing it for future readers, your opening sentence can be misread as being a bit confrontational and condescending.