NLog: Memory Leaks in GraylogTarget

Hi guys, There is memory leak problem in NLog. We started using new target - Graylog, that implements TargetWithContext. Our most loaded application sends about 200-400 messages per second and, with enabled Graylog target, its memory consumption constantly grow up to almost max available value for the app - 7 GB. This app service hosted on Azure. Please see “Average memory working set” metric dynamics: image One of spikes (that corresponding to collected memory dump): image

Then we collected memory dump of our service with Graylog target enabled. Here also please see selection from analysis report for this dump: image

From screenshot above we can see TOP memory consumers are tyoes from NLog. Observations: Unlimited constant memory grow happen only with our only app that sends highest number of messages per second. Problem occurs only when we enable Graylog target (inherited from TargetWithContext). The same set of messages we duplicate (send) also to Syslog : TargetWithLayout. But it does not create a problem when Graylog is disabled. Memory leak occurs only when both enabled.

NLog version: 4.6.8 (latest stable)

Platform: .Net 4.6.1

Current NLog config image

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      autoReload="true"
      throwExceptions="false"
      keepVariablesOnReload="true">
  <extensions>
    <add assembly="NLog.Targets.Syslog" />
    <!-- Required for Nlog to know where to load target from -->
    <add assembly="Lionbridge.Graylog.Nlog"/>
    <add assembly="Microsoft.ApplicationInsights.NLogTarget" />
  </extensions>

  <targets>
    <target xsi:type="AsyncWrapper"
        name="PapertrailWrapped"
        timeToSleepBetweenBatches="0"
        overflowAction="Grow">
      <target type="Syslog" name="syslog" syslogserver="value_hidden" port="value_hidden" 
              layout="${longdate}|${level:uppercase=true:padding=-5}|${logger}.${message}${onexception:${newline}Exception\: ${exception:format=tostring}}${newline}" 
              facility="Local7" machinename="${var:PapertrailMachineName:default=VoxLeonis}" sender="SER"/>
    </target>
    <target xsi:type="AsyncWrapper"
        name="GraylogWrapped"
        timeToSleepBetweenBatches="0"
        overflowAction="Grow">
      <target name="GrayLog"
              layout="${longdate}|${logger}.${message}"
              xsi:type="GraylogHttp"
              facility="Services"
              graylogServer="value_hidden"
              graylogPort="value_hidden"
              graylogAuthScheme="Basic"
              graylogAuthSecret="value_hidden"
              includeEventProperties="True">
      </target>
    </target>
    <target xsi:type="AsyncWrapper"
        name="ApplicationInsightsWrapped"
        timeToSleepBetweenBatches="0"
        overflowAction="Grow">
      <target type="ApplicationInsightsTarget" name="aiTarget" layout="${time}${logger}.${message}${onexception:${newline}Exception\: ${exception:format=tostring}}${newline}" />
    </target>
  </targets>
  <rules>
    <logger name="_*" minlevel="Trace" writeTo="PapertrailWrapped" />
    <logger name="_*" minlevel="Trace" appendTo="GraylogWrapped"/>
    <logger name="_*" minlevel="Trace" writeTo="ApplicationInsightsWrapped" />
  </rules>
</nlog>

Workarounds I tried: According to recommendations in similar issues here, I tried:

  1. Tried to replace: overflowAction=“Grow” with overflowAction=“Block”
  2. Remove
    layout=“${longdate}|${logger}.${message}” from Graylog target.
  3. Updated NLog from 4.6.8 to newest 4.7.0-rc1 At first glance, it seems it didn’t help (according to observations on staging)…

Later on I’ll collect and attach metrics and memory dump from production, after above 3 applied (it still not released on production).

It is interesting what is root cause of problem and where it is in source code, not only workarounds to fix it.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

How about trying out this layout:

https://www.nuget.org/packages/NLog.GelfLayout

And then use it with the NLog NetworkTarget:

<nlog>
  <time type="AccurateUTC" />
  <extensions>
    <add assembly="NLog.Layouts.GelfLayout" />
  </extensions>
  <targets>
    <target xsi:type="AsyncWrapper" name="GraylogWrapped" overflowAction="Grow">
      <target name="graylogt" xsi:type="network" address="http://graylog:12201">
        <layout xsi:type="GelfLayout" facility="Services">
          <field name="exception_type" layout="${exception:Type}" />
        </layout>
      </target>
    </target>
  </targets>
  <rules>
     <logger name="*" minlevel="debug" writeTo="GraylogWrapped" />
  </rules>
</nlog>

To optimize even further then you can change from http (request/reply) to tcp (streaming):

<nlog>
  <time type="AccurateUTC" />
  <extensions>
      <add assembly="NLog.Layouts.GelfLayout" />
  </extensions>
  <targets>
    <target xsi:type="AsyncWrapper" name="GraylogWrapped" overflowAction="Grow">  
      <target name="graylogt" xsi:type="network" address="tcp://graylog:12200" newLine ="true" lineEnding="Null" >
        <layout xsi:type="GelfLayout" facility="WebApi">
          <field name="exception_type" layout="${exception:Type}" />
        </layout>
      </target>
    </target>
  </targets>
  <rules>
      <logger name="*" minlevel="debug" writeTo="GraylogWrapped" />
  </rules>
</nlog>

See how to setup Graylog for TCP from #3542

Believe NLog 4.7 RC will improve the Network-Target even further. See also #3739

@TheXenocide This is why I recommend changing to TCP-streaming instead of HTTP-request/reply. This will allow TCP-network-protocol to perform batching.

But I guess it is also possible to do batching with HTTP-request/reply by writing multiple logevents into the same payload using newline.

The only problem with batching is handling retransmission’s on connection-failure.

Is it possible that messages are simply being written to an async queue faster than the target can stream them out? If, for example, the target successfully writes 200 messages per second and you’re adding 400 messages per second the queue will grow by 200 messages every second.

If there’s a timestamp on the log and you can place a timestamp on the receiving end you can compare to see how long from the time the event was written to the time that it was received by the server. If that difference continues to grow then it would be a pretty good indicator that this is your problem.

If that is your problem you could either make the logging synchronous so that execution will not continue until the log message has completed writing to the server; otherwise the only way for that queue to catch up is to stop writing messages to it long enough for it to flush the backlog.