runtime: Native memory leak with ETW EventSource

Reported originally at https://github.com/Microsoft/ApplicationInsights-dotnet/issues/1102

Repros on .NET Core 2.2, x86 and x64 on Windows.

Build the following application:

using System.Diagnostics.Tracing;

namespace RPESLeak
{
    class MyEventSource : EventSource
    {
        private const EventKeywords Operations = (EventKeywords)0x1;

        private const string EventProviderName = "PSH-LeakTest";

        public MyEventSource() : base(
               EventProviderName,
               EventSourceSettings.EtwSelfDescribingEventFormat)
        {
        }

        public void ProcessOperationStart(long operation)
        {
            if (IsEnabled(EventLevel.Informational, Operations))
            {
                var payload = new { Leak = "leaky", Id = operation };
                Write(
                    "Request",
                    new EventSourceOptions { Keywords = Operations, Opcode = EventOpcode.Start, Level = EventLevel.Informational },
                    payload);
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var eventSource = new MyEventSource();
            for (long i = 0; ; i++)
            {
                eventSource.ProcessOperationStart(i);
            }
        }
    }
}

Run it on Windows and watch the memory usage of dotnet.exe in Task Manager. It settles down on my machine around 10 MB.

Now enable the ETW event listener for the event source. The easiest way to do that is to run PerfView. Press Alt+C to start a collection and make it look like this:

The important thing is to add this in Additional Providers: PSH-LeakTest::Informational You can turn off all the other options. Now, when you start logging, watch the memory usage in dotnet.exe climb like crazy (over 200 MB in 5 seconds). When you stop logging, the memory isn’t freed. If you create a dump of the dotnet.exe process, the GC heap is fine. Most of the additional memory is in native heap and it seems to consist of the event manifests repeated over and over again.

This does not reproduce on .NET 4.7.2 I don’t know about earlier .NET Core runtimes.

Please let me know if you need any more information, but I think this is a pretty minimal repro with clear instructions. One thing I could do is eliminate PerfView by writing a custom app based around TraceEvent or even a WPR profile.

To be clear how this affects Application Insights customers: The Application Insights Profiler runs in Azure App Services. It captures ETW sessions for 2 minutes at a time about once an hour. One of the providers it enables (listens to), is the Application Insights “Rich Payload Event Source” (see the original bug in the Application Insights repo). Customers are now encountering these memory leaks under load when profiling is running leading to application restarts.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 3
Comments: 37 (25 by maintainers)

Most upvoted comments

I’m preparing port to 2.2 right now.

jorive on Apr 1, 2019

@jorive - it sounds like this bug is rather severe. I think we have a strong case to back-port it to V2.2. We should pursue that.

vancem on Apr 1, 2019

cc @josalem

This is definitely not using a circular buffer as I have an instance that’s been running for 9 days and is now at an approximate 7GB commit size and still growing!

That definitely doesn’t sound normal to me either.

I have confirmed .Net Core 3 - Preview 8 does not experience the same issue.

This makes me less nervous. That being said, AFAIK @jorive already ported his fix (https://github.com/dotnet/coreclr/pull/23661) to 2.2 and if it’s not addressing the issue, we might be looking at a different issue from one that was already ported.

Do you happen to have a memory dump and/or a small repro that I can use to repro the behavior?

sywhang on Aug 21, 2019

@mjsabby memory isn’t used when pipe is disabled, when enabled memory usage climbs like crazy under traffic, while heap usage remains at same level @jorive is buffer size configurable? It looks like it’s little bit to much, use 1 gb of memory if you want to gather some basic metrics for dashboard and monitoring

avandych on Jun 25, 2019

Including @brianrob who wrote this.

I took a quick look and the key place is the stack

Name
--
\|\|  \|\|\|\|\|\|  \|+ system.private.corelib!System.Diagnostics.Tracing.EventSource.Write[System.__Canon](System.String, System.Diagnostics.Tracing.EventSourceOptions, System.__Canon)
\|\|  \|\|\|\|\|\|  \|\|+ system.private.corelib!EventSource.WriteImpl
\|\|  \|\|\|\|\|\|  \|\|\|+ system.private.corelib!NameInfo.GetOrCreateEventHandle
\|\|  \|\|\|\|\|\|  \|\|\|\|+ system.private.corelib!IEventProvider.DefineEventHandle
\|\|  \|\|\|\|\|\|  \|\|\|\|\|+ coreclr!EventPipeInternal::DefineEvent
\|\|  \|\|\|\|\|\|  \|\|\|\|\| + coreclr!EventPipeProvider::AddEvent

Which says that starting with an event write, we try to create a handle for it (for EventPipe), and this seems to allocate ALL THE TIME. However a handle to an event is about its Meta-Data, and thus there should be a lookup and MOST of the time we should NOT be creating new event meta-data.

Thus the basic issue is that we are missing our ‘lookup’ that would find that we already have meta-data for this event.

@brianrob - Can you quickly tell us where it SHOULD be happening?

There is a secondary issue that we should not need this meta-data at all unless EventPipe is turned on (which it is not), so it would be nice if we avoided even generating the handle if EventPipe is off.

@jorive - We should also check the scenario where we are collecting with EventPIpe instead of ETW before we call this bug fixed.

vancem on Apr 1, 2019

Thanks @pharring, I am able to repro it locally with the code above.

jorive on Apr 1, 2019

@pharring I’m taking a look.

jorive on Mar 30, 2019

The thought occurred to me that PerfView is just the tool for tracking down the heap allocations. Indeed, here are the relevant allocating stacks:

Name	Inc %
\|\| + ntdll!RtlpAllocateHeapInternal	98.6
\|\| + coreclr!EEHeapAllocInProcessHeap	98.6
\|\| + coreclr!operator new[]	50.2
\|\| \|+ coreclr!EventPipeEvent::EventPipeEvent	50.2
\|\| \|\|+ coreclr!EventPipeProvider::AddEvent	50.2
\|\| \|\| + coreclr!EventPipeInternal::DefineEvent	50.2
\|\| \|\| + system.private.corelib!IEventProvider.DefineEventHandle	50.2
\|\| \|\| + system.private.corelib!NameInfo.GetOrCreateEventHandle	50.2
\|\| \|\| + system.private.corelib!EventSource.WriteImpl	50.2
\|\| \|\| + ?!?	50.2
\|\| \|+ coreclr!CStructArray::Grow	0.0
\|\| + coreclr!operator new	48.4
\|\| + coreclr!EventPipeProvider::AddEvent	48.4
\|\| + coreclr!EventPipeInternal::DefineEvent	33.8
\|\| \|+ system.private.corelib!IEventProvider.DefineEventHandle	33.8
\|\| \| + system.private.corelib!NameInfo.GetOrCreateEventHandle	33.8
\|\| \| + system.private.corelib!EventSource.WriteImpl	33.8
\|\| \| + ?!?	33.8
\|\| + coreclr!EventPipeProvider::AddEvent	14.6
\|\| + coreclr!EventPipeInternal::DefineEvent	14.6
\|\| + system.private.corelib!IEventProvider.DefineEventHandle	14.6
\|\| + system.private.corelib!NameInfo.GetOrCreateEventHandle	14.6
\|\| + system.private.corelib!EventSource.WriteImpl	14.6
\|\| + ?!?	14.6

pharring on Mar 29, 2019