runtime: Enumerable.Chunk throws OutOfMemoryException for very large chunk sizes.

Description

The exception is thrown even if the actual size of the enumerable is very small and the operation could (in theory) complete.

Reproduction Steps

new[] { 1 }.Chunk(int.MaxValue).ToArray();

Expected behavior

Returns a single element enumerable with a single chunk array of length 1.

Actual behavior

Throws OutOfMemoryException due to attempting to allocate a large array.

Regression?

No response

Known Workarounds

No response

Configuration

.NET 6, VS2022, Windows 10 x64.

Other information

One could argue that this shouldn’t be fixed because callers just shouldn’t ask for chunks that large. However, this can come up when using int.MaxValue as a means of “disabling” chunking/batching, especially when said batching is performed deeper in the call stack.

A fix for this would be to special case the creation of the first chunk when size is above a certain threshold. For large sizes, we can assume that the chunk probably won’t fill and therefore we can build up to the full size (as is done for ToArray() rather than starting by trying to allocate the final array).

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 25 (22 by maintainers)

Commits related to this issue

Most upvoted comments

It’s not worth special-casing

Using a single List<T> to buffer each chunk seems reasonable.

The extra cost of this vs. special-casing the first chunk is just having to clear the list between iterations and having to copy the elements one more time from the list to the array, right?

On my machine, using a list for every chunk is about 10-15% slower than just using a list for the first chunk, which isn’t too bad.

Both approaches are much better than the current implementation in lopsided cases like chunk size 1,000,000 with just 10 elements.

Do you think that using a list every time is better than the current implementation given that it supports the lopsided cases better?

I wouldn’t be opposed to a solution that reverts to using a list when size exceeds a given threshold. It might help running benchmarks to determine the appropriate cutoff.

I have encountered this, and I’ve done the “disabling”. In my own utility method I make sure to not eagerly allocate large arrays.

Finding out about this issue, I will not be able to replace my utility method. My code would break.

It’s not just about OOM. It’s undesirable to allocate a 1 million element array if the actual sequence is just a few items. Let’s say you batch database inserts 1 million items at a time but it is common to have just a few.

Special-casing int.MaxValue is not necessary and not a sufficient fix. There should be an upper bound on the internal array allocation. You could argue for something as low as 4 (optimize for small actual item counts) and as something as high as 64000 (just prevent massive allocations that turn out to be unnecessary).

It’s a difficult choice to make but, from my experience, large arguments and small item counts are a very real scenario.