runtime: GC out of memory inside the docker container

Description

When running a .NET 7 console application inside a docker container with a memory limit of 400mb, my application crashes, although DotMemory profiling shows that it is able to use just over 200mb.

Reproduction Steps

This is my simple program to run:

internal class Program
{
    private static void Main(string[] args)
    {
        foreach (KeyValuePair<string, object> config in GC.GetConfigurationVariables())
        {
            Console.WriteLine($"{config.Key}   {config.Value}");
        }

        StringBuilder sb = new StringBuilder();

        while (true)
        {
            for (int i = 0; i < 500_000; i++)
            {
                sb.Append(Guid.NewGuid());
                sb.Append(Guid.NewGuid());
                sb.Append(Guid.NewGuid());
            }

            string str = sb.ToString();

            int allocatedBytesByString = Encoding.UTF8.GetByteCount(str);

            Console.WriteLine($"Allocated on this stage megabytes by string: {(allocatedBytesByString / 1024 / 1024)}");

            sb.Clear();

            Console.WriteLine(str);
        }
    }
}

This is memory profiling without memory limits: DotMemory profiling

This is out of memory exception when i starting docker container with 400mb memory limit. When displaying the collector configuration, we can see that it noticed the memory limit and limited the heap size to 75%. Out of memory

The following screenshot shows that 1 large object occupies 51 megabytes:

Allocated by string

Expected behavior

I expected that the garbage collector will start collecting objects more often so as not to run into an OutOfMemoryException.

Actual behavior

In fact, when starting the application, we immediately get an error OutOfMemoryException

Regression?

No response

Known Workarounds

No response

Configuration

  • .NET 7.0.100
  • OS Windows 10 Pro 21H2
  • x64
  • Docker Server: Docker Desktop 4.20.0 (109717)

Other information

The reason for this is probably the following: DotMemory shows that in the first few seconds of its operation, the application clearly starts consuming more memory than allowed (about 600 mb), and only then it starts to go down to 200mb.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

In my opinion, on the sb.Clear() line, we have a StringBuilder buffer in the form of a large number of char arrays with a total size of about 103 MB and a string of approximately the same size. We cannot clear this string because it is needed on the next line Console.WriteLine(str). sb.Clear() tries to allocate a new buffer, also about 103MB in size. And at this stage, an OutOfMemoryException occurs because there is not enough memory to store these 3 objects.

https://github.com/dotnet/runtime/blob/03f2be66ea268a1ea285899f56f55b81e5a25044/src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs#L421-L425

In the case when we change the sequence of calling sb.Clear() and Console.WriteLine(str), the string can already be cleared on the sb.Clear() line and only the StringBuilder buffer remains in memory. Therefore, there is enough memory and OutOfMemoryException does not occur.

You can also specify Capacity in StringBuilder. In this case, sb.Clear() will not allocate a new buffer

In my opinion, diagnosing OOM is very difficult or almost impossible without a memory dump of the process at the time of the crash. Because OOM can occur even on the line object o = (object)1.

The vectors for improvement can be the optimization of memory allocation after the occurrence of OOM in order to always receive a correct stacktrace. It might make sense to create a static OutOfMemoryException object and use it in crash locations instead of creating the object each time

It is also not entirely clear why StringBuilder.Clear() sees a new memory buffer in the case when we have more than 1 link in the StringBuilder chain. It may be possible to reuse memory that was previously allocated

Thanks for the dump, @MMaximus111! We took a look and found that based on your container size, the app simply ran out of memory and after trying to do a full blocking GC, threw an OOM exception.

Details

We deduced this by opening the dump in windbg and entering !ao, the command to display details about OOMs and found that the request that threw the exception requested 108002488 bytes.

0:000> !ao
Managed OOM occurred after GC #52 (Requested to allocate 108002488 bytes)
Reason: Didn't have enough memory to commit

By the time the OOM was thrown, the total amount of committed memory was 229957632 bytes.

0:000> x libcoreclr!*total_committed
00007f06`0dd30f10 libcoreclr!WKS::gc_heap::current_total_committed = 0xdb4e000
0:000> ? 0xdb4e000
Evaluate expression: 229957632 = 00000000`0db4e000

Adding up the total committed and the requested bytes to allocate returns in 337960120 bytes that’s greater than your heap hard limit of 314572800 bytes.

0:000> ? 0n229957632 + 0n108002488 
Evaluate expression: 337960120 = 00000000`1424dcb8

We also checked to see if the GC did the right thing by trying one last time to do a full blocking GC and found that despite the fact that it was compacting, it didn’t help.

0:000> x libcoreclr!*gchist*
00007f06`0dd2f180 libcoreclr!WKS::gc_heap::gchist_index = 0n52
00007f06`0dd2f190 libcoreclr!WKS::gc_heap::gchist = WKS::gc_mechanisms_store [64]

0:000> dx -r1 (*((libcoreclr!WKS::gc_mechanisms_store *)0x7f060dd2f7f0))
(*((libcoreclr!WKS::gc_mechanisms_store *)0x7f060dd2f7f0))                 [Type: WKS::gc_mechanisms_store]
    [+0x000] gc_index         : 0x34 [Type: size_t]
    [+0x008] promotion        : true [Type: bool]
    [+0x009] compaction       : true [Type: bool]
    [+0x00a] loh_compaction   : true [Type: bool]
    [+0x00b] heap_expansion   : false [Type: bool]
    [+0x00c] concurrent       : false [Type: bool]
    [+0x00d] demotion         : false [Type: bool]
    [+0x00e] card_bundles     : true [Type: bool]
    [+0x00f] should_lock_elevation : true [Type: bool]
    [+0x010 ( 7: 0)] condemned_generation : 2 [Type: int]
    [+0x010 (15: 8)] gen0_reduction_count : 0 [Type: int]
    [+0x010 (23:16)] elevation_locked_count : 0 [Type: int]
    [+0x010 (31:24)] reason           : reason_oos_loh (0x6) [Type: gc_reason]
    [+0x014 ( 7: 0)] pause_mode       : pause_interactive (0x1) [Type: WKS::gc_pause_mode]
    [+0x014 (15: 8)] b_state          : bgc_not_in_process (0x0) [Type: bgc_state]
    [+0x016] found_finalizers : false [Type: bool]
    [+0x017] background_p     : false [Type: bool]
    [+0x018] stress_induced   : false [Type: bool]
    [+0x01c] entry_memory_load : 0x37 [Type: uint32_t]