runtime: Native memory leak with ETW EventSource

Reported originally at https://github.com/Microsoft/ApplicationInsights-dotnet/issues/1102

Repros on .NET Core 2.2, x86 and x64 on Windows.

Build the following application:

using System.Diagnostics.Tracing;

namespace RPESLeak
{
    class MyEventSource : EventSource
    {
        private const EventKeywords Operations = (EventKeywords)0x1;

        private const string EventProviderName = "PSH-LeakTest";

        public MyEventSource() : base(
               EventProviderName,
               EventSourceSettings.EtwSelfDescribingEventFormat)
        {
        }

        public void ProcessOperationStart(long operation)
        {
            if (IsEnabled(EventLevel.Informational, Operations))
            {
                var payload = new { Leak = "leaky", Id = operation };
                Write(
                    "Request",
                    new EventSourceOptions { Keywords = Operations, Opcode = EventOpcode.Start, Level = EventLevel.Informational },
                    payload);
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var eventSource = new MyEventSource();
            for (long i = 0; ; i++)
            {
                eventSource.ProcessOperationStart(i);
            }
        }
    }
}

Run it on Windows and watch the memory usage of dotnet.exe in Task Manager. It settles down on my machine around 10 MB.

Now enable the ETW event listener for the event source. The easiest way to do that is to run PerfView. Press Alt+C to start a collection and make it look like this: image

The important thing is to add this in Additional Providers: PSH-LeakTest::Informational You can turn off all the other options. Now, when you start logging, watch the memory usage in dotnet.exe climb like crazy (over 200 MB in 5 seconds). When you stop logging, the memory isn’t freed. If you create a dump of the dotnet.exe process, the GC heap is fine. Most of the additional memory is in native heap and it seems to consist of the event manifests repeated over and over again.

This does not reproduce on .NET 4.7.2 I don’t know about earlier .NET Core runtimes.

Please let me know if you need any more information, but I think this is a pretty minimal repro with clear instructions. One thing I could do is eliminate PerfView by writing a custom app based around TraceEvent or even a WPR profile.

To be clear how this affects Application Insights customers: The Application Insights Profiler runs in Azure App Services. It captures ETW sessions for 2 minutes at a time about once an hour. One of the providers it enables (listens to), is the Application Insights “Rich Payload Event Source” (see the original bug in the Application Insights repo). Customers are now encountering these memory leaks under load when profiling is running leading to application restarts.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 37 (25 by maintainers)

Most upvoted comments

I’m preparing port to 2.2 right now.

@jorive - it sounds like this bug is rather severe. I think we have a strong case to back-port it to V2.2. We should pursue that.

cc @josalem

This is definitely not using a circular buffer as I have an instance that’s been running for 9 days and is now at an approximate 7GB commit size and still growing!

That definitely doesn’t sound normal to me either.

I have confirmed .Net Core 3 - Preview 8 does not experience the same issue.

This makes me less nervous. That being said, AFAIK @jorive already ported his fix (https://github.com/dotnet/coreclr/pull/23661) to 2.2 and if it’s not addressing the issue, we might be looking at a different issue from one that was already ported.

Do you happen to have a memory dump and/or a small repro that I can use to repro the behavior?

@mjsabby memory isn’t used when pipe is disabled, when enabled memory usage climbs like crazy under traffic, while heap usage remains at same level @jorive is buffer size configurable? It looks like it’s little bit to much, use 1 gb of memory if you want to gather some basic metrics for dashboard and monitoring

Including @brianrob who wrote this.

I took a quick look and the key place is the stack

Name
--
\|\|  \|\|\|\|\|\|  \|+ system.private.corelib!System.Diagnostics.Tracing.EventSource.Write[System.__Canon](System.String, System.Diagnostics.Tracing.EventSourceOptions, System.__Canon)
\|\|  \|\|\|\|\|\|  \|\|+ system.private.corelib!EventSource.WriteImpl
\|\|  \|\|\|\|\|\|  \|\|\|+ system.private.corelib!NameInfo.GetOrCreateEventHandle
\|\|  \|\|\|\|\|\|  \|\|\|\|+ system.private.corelib!IEventProvider.DefineEventHandle
\|\|  \|\|\|\|\|\|  \|\|\|\|\|+ coreclr!EventPipeInternal::DefineEvent
\|\|  \|\|\|\|\|\|  \|\|\|\|\| + coreclr!EventPipeProvider::AddEvent

Which says that starting with an event write, we try to create a handle for it (for EventPipe), and this seems to allocate ALL THE TIME. However a handle to an event is about its Meta-Data, and thus there should be a lookup and MOST of the time we should NOT be creating new event meta-data.

Thus the basic issue is that we are missing our ‘lookup’ that would find that we already have meta-data for this event.

@brianrob - Can you quickly tell us where it SHOULD be happening?

There is a secondary issue that we should not need this meta-data at all unless EventPipe is turned on (which it is not), so it would be nice if we avoided even generating the handle if EventPipe is off.

@jorive - We should also check the scenario where we are collecting with EventPIpe instead of ETW before we call this bug fixed.

Thanks @pharring, I am able to repro it locally with the code above.

@pharring I’m taking a look.

The thought occurred to me that PerfView is just the tool for tracking down the heap allocations. Indeed, here are the relevant allocating stacks:

Name Inc %
||      + ntdll!RtlpAllocateHeapInternal 98.6
||       + coreclr!EEHeapAllocInProcessHeap 98.6
||        + coreclr!operator new[] 50.2
||        |+ coreclr!EventPipeEvent::EventPipeEvent 50.2
||        ||+ coreclr!EventPipeProvider::AddEvent 50.2
||        || + coreclr!EventPipeInternal::DefineEvent 50.2
||        ||  + system.private.corelib!IEventProvider.DefineEventHandle 50.2
||        ||   + system.private.corelib!NameInfo.GetOrCreateEventHandle 50.2
||        ||    + system.private.corelib!EventSource.WriteImpl 50.2
||        ||     + ?!? 50.2
||        |+ coreclr!CStructArray::Grow 0.0
||        + coreclr!operator new 48.4
||         + coreclr!EventPipeProvider::AddEvent 48.4
||          + coreclr!EventPipeInternal::DefineEvent 33.8
||          |+ system.private.corelib!IEventProvider.DefineEventHandle 33.8
||          | + system.private.corelib!NameInfo.GetOrCreateEventHandle 33.8
||          |  + system.private.corelib!EventSource.WriteImpl 33.8
||          |   + ?!? 33.8
||          + coreclr!EventPipeProvider::AddEvent 14.6
||           + coreclr!EventPipeInternal::DefineEvent 14.6
||            + system.private.corelib!IEventProvider.DefineEventHandle 14.6
||             + system.private.corelib!NameInfo.GetOrCreateEventHandle 14.6
||              + system.private.corelib!EventSource.WriteImpl 14.6
||               + ?!? 14.6