aspnetcore: Memory leak when serializing large string properties with System.Text.Json

Description:

I’ve encountered a significant memory issue when serializing large string properties using the built-in System.Text.Json serializer in ASP.NET Core. This suggests a potential memory leak.

In the larger context of my application, out of over 50,000 messages, only 3 messages contained large strings (exceeding 20MB in size). This resulted in the memory consumption jumping from around 300 MB to several gigabytes. When I switched to using JSON.NET as the serializer for ASP.NET, this problem did not manifest. Also, when I limited the message size to a maximum of 100,000 characters, the issue disappeared.

Here’s a simplified code snippet that reproduces the issue:

[HttpGet]
public IEnumerable<LogMessage> Get()
{
    var message = new StringBuilder();
    for (int i = 0; i < 10_000; i++)
    {
        message.AppendLine(Guid.NewGuid().ToString());
    }

    var logMessage = new LogMessage
    {
        Message = message.ToString()
    };
    var logMessages = new List<LogMessage>();

    for (int i = 0; i < 100; i++)
    {
        logMessages.Add(logMessage);
    }

    return logMessages;
}

public class LogMessage
{
    public string Message { get; set; }
}

Observations:

  • After each call to the Get method, there’s a noticeable increase in memory consumption.
  • When I swap the values 10_000 and 100, the memory consumption remains relatively constant.

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

var enumerable = Enumerable.Repeat(new string('x', 400_000), 300);

var stream = new MemoryStream();
while (true)
{
    await JsonSerializer.SerializeAsync(stream, enumerable);
    stream.Position = 0;
}

This example isn’t a fair apples-to-apples comparison. Json uses ArrayPool<T>.Shared which uses a [ThreadLocal] for the bucket of pooled arrays. And since this example is using MemoryStream the code never actually goes async and thus stays on the same thread the entire time. If you add await Task.Yield() in the loop you will start to see a higher working set due to a new thread occasionally picking up the work and so an empty bucket of pooled arrays will be hit.

the latter will demonstrate unbounded increase the more that endpoing is being hit.

There is not an unbounded increase in memory. The memory on my 16 core machine peaks around 700MB with Server GC. And if I use Workstation GC it peaks around 460MB due to the GC actually doing work now. (Remember Server GC sees lots of available memory on your machine and happily uses it)

If you look at what memory is being used, you can see that Json uses the classic doubling array sizes when it needs more memory and it seems to get to and stay at a steady state of 16MB byte[] while writing the large enumerable. It starts at 2MB due to the 400,000 size string * 3 for max transcoding size which is 1.2m and array buckets are in power of 2 sizes, so 2MB. So Json allocates 2 + 4 + 8 + 16 = 30MB from the array pool every time it writes the full enumerable. 30MB * 16 cores = 480MB which is roughly the max memory our process will consume just from the JSON code.

Another semi-big chunk of memory (~40MB) when using Server GC is from System.IO.Pipelines.BufferSegment which is from Kestrel due to the internal pooling done by Pipes which is sadly thrown away when the connection is closed. See https://github.com/dotnet/runtime/issues/49259 for some details. But this is all dead memory that the GC can cleanup when it feels like it. The rest of the memory is small in comparison and not worth looking for this discussion.

I did a quick spike of https://github.com/dotnet/runtime/issues/68586 which David mentioned earlier in this thread and it shows significant improvements in working set when writing large payloads. In short with Server GC and Json writing directly to the Pipe results in a peak of roughly 140MB. This is largely due to Json only getting a 2MB buffer from the array pool when writing instead of doubling all the way up to 16MB. I didn’t look too deeply into the Json code to figure out why it grabs such big arrays from the array pool, but this might be a potential improvement to be looked into?

I think we can close this issue. And potentially open a new issue against Json to not allocate such large arrays? And I’ll follow up with some more detailed analysis of Json + Pipes in https://github.com/dotnet/runtime/issues/68586.