NLog: Deadlock when under load

Hi! Thanks for reporting this feature/bug/question!

Please keep / fill in the relevant info from this template so that we can help you as best as possible.

For .NET Core users, please check the platform support: https://github.com/NLog/NLog/wiki/platform-support

Type (choose one):

  • Bug

NLog version: 5.0.0-beta06 (and 5.0.0-beta03)

Platform: .NET Core

Current NLog config (xml or C#, if relevant)

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      autoReload="true"
      internalLogLevel="Warn"
      internalLogFile="/tmp/internal-nlog.txt">
 
  <extensions>
    <add assembly="PlayAuth.Common" />
  </extensions>
 
  <variable name="correlationLayout" value="${when:when=length('${correlationId}') &gt; 0:inner= Id=${correlationId},}" />
  <variable name="messageLayout" value="Message=${replace-newlines:Message=${truncated:Limit=10000:inner=${message}}" />
  <variable name="exceptionLayout" value="${when:when=length('${exception}') &gt; 0:inner= Type=${exception:Format=Type} Exception=${exception:format=ToString,StackTrace}${newline}}" />
 
  <targets>
    <target name="splunkfileAsync" xsi:type="AsyncWrapper" queueLimit="20000" overflowAction="Discard">
      <target xsi:type="File" name="file"
              layout="[${date:universalTime=true:format=O}] Level=${level}, Category=${logger},${correlationLayout} ${messageLayout} ${exceptionLayout}"
              fileName="/var/log/starz/playauth.log" archiveFileName="/var/log/starz/log-own.log" archiveAboveSize="52428800" archiveNumbering="Sequence" maxArchiveFiles="2" />
    </target>
    <target xsi:type="Null" name="blackhole" />
  </targets>
 
  <rules>
    <logger name="Microsoft.*" minlevel="Trace" writeTo="blackhole" final="true" />
    <logger name="NewRelic.*" minlevel="Trace" writeTo="blackhole" final="true" />
    <logger name="Enyim.*" levels="Trace,Debug" writeTo="blackhole" final="true" />
    <logger name="*" minlevel="Trace" writeTo="splunkfileAsync" />
  </rules>
</nlog>

In case of a BUG:

  • What is the current result? Under heavy load, we experience an apparent deadlock and threadpool exhaustion due to a lock somewhere inside of nlog

  • What is the expected result? No deadlocks

  • Did you checked the Internal log? Yes

  • Please post full exception details (message, stacktrace, inner exceptions)

  • Are there any work arrounds? No

  • Is there a version in which it did worked? Not with .net core

  • Can you help us by writing an unit test? Unfortunately, not yet. Trying to reliably reproduce has not yet succeeded.

  • Details: We are using NLog 5 with .NET Core in an ASP.NET MVC web api application and encountering what appear to be occasional deadlock issues in production. The problem manifests as a sudden “lock up” of the server. HTTP clients are still able to connect to the socket, but the requests are not serviced. We can see that the number of threads in the application spikes up to the ~30000 number when the problem occurs. Thus far, we have not been able to reproduce it anywhere besides production. We have encountered this problem with 5.0.0-beta03 as well as with 5.0.0-beta06.

We are running in Docker on Linux. After enlisting some help from some folks at Microsoft, we were finally able to get some thread dumps on one of the locked processes. In each case, it appears that all but a handful of system threads are waiting on a monitor, and that monitor was entered by the nlog AsyncTargetWrapper.Write or PrecalculateVolatileLayouts method.

We’ve looked at the nlog source but are not really able to see anything that would explain the deadlock… hoping that the experts in the NLog project would be able to shed some light. 😃

Any details you have would be greatly appreciated. Thank you very much!

  • Thread dump
(lldb) dumpstack        
OS Thread Id: 0x4515 (6754)
TEB information is not available so a stack size of 0xFFFF is assumed
Current frame: libpthread.so.0!__pthread_cond_wait + 0xbf
Child-SP         RetAddr          Caller, Callee
00007FE140F27310 00007fe6f61e04e2 libcoreclr.so!CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 0x142, calling libcoreclr.so!pthread_cond_wait
00007FE140F27370 00007fe6f61e01a0 libcoreclr.so!CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 0x210, calling libcoreclr.so!CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*)
00007FE140F273D0 00007fe6f61e5051 libcoreclr.so!CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int) + 0x751
00007FE140F27570 00007fe6f5e3254b libcoreclr.so!Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode) + 0x22b, calling libcoreclr.so!WaitForMultipleObjectsEx
00007FE140F27610 00007fe6f5e2d2f0 libcoreclr.so!Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 0x50, calling libcoreclr.so!Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode)
00007FE140F27650 00007fe6f5f1f985 libcoreclr.so!CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 0x65, calling libcoreclr.so!Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)
00007FE140F276B0 00007fe6f6c5a27d libc.so.6!__clock_gettime + 0x1d
00007FE140F276F0 00007fe6f5e2cc01 libcoreclr.so!AwareLock::EnterEpilogHelper(Thread*, int) + 0x141, calling libcoreclr.so!CLREventBase::Wait(unsigned int, int, PendingSync*)
00007FE140F27780 00007fe6f5e2c22a libcoreclr.so!AwareLock::Enter() + 0xca, calling libcoreclr.so!AwareLock::EnterEpilogHelper(Thread*, int)
00007FE140F277D0 00007fe6f5e2c932 libcoreclr.so!AwareLock::Contention(int) + 0x402, calling libcoreclr.so!AwareLock::Enter()
00007FE140F27880 00007fe6f5eb3c55 libcoreclr.so!JIT_MonContention_Helper(Object*, unsigned char*, void*) + 0x175, calling libcoreclr.so!AwareLock::Contention(int)
00007FE140F27948 00007fe67d71202b (MethodDesc 00007fe67cdb8d20 + 0x6b NLog.Targets.Target.PrecalculateVolatileLayouts(NLog.LogEventInfo)), calling 00007fe6f5eb4530 (stub for System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef))
00007FE140F279F0 00007fe6f5eb3bed libcoreclr.so!JIT_MonContention_Helper(Object*, unsigned char*, void*) + 0x10d, calling libcoreclr.so!LazyMachStateCaptureState
00007FE140F27A40 00007fe6f5eb4656 libcoreclr.so!JIT_MonReliableEnter_Portable + 0x126, calling libcoreclr.so!JIT_MonContention_Helper(Object*, unsigned char*, void*)
00007FE140F27A60 00007fe67d71202b (MethodDesc 00007fe67cdb8d20 + 0x6b NLog.Targets.Target.PrecalculateVolatileLayouts(NLog.LogEventInfo)), calling 00007fe6f5eb4530 (stub for System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef))
00007FE140F27AD0 00007fe67d711c4c (MethodDesc 00007fe67d4aa958 + 0x3c NLog.Targets.Wrappers.AsyncTargetWrapper.Write(NLog.Common.AsyncLogEventInfo)), calling (MethodDesc 00007fe67cdb8d20 + 0 NLog.Targets.Target.PrecalculateVolatileLayouts(NLog.LogEventInfo))
00007FE140F27B10 00007fe67d711b8f (MethodDesc 00007fe67d4aa960 + 0x3f NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(NLog.Common.AsyncLogEventInfo))
00007FE140F27B50 00007fe67d7119e8 (MethodDesc 00007fe67cdb8d38 + 0x118 NLog.Targets.Target.WriteAsyncLogEvent(NLog.Common.AsyncLogEventInfo))
00007FE140F27BB0 00007fe67d71173d (MethodDesc 00007fe67d4549f8 + 0xcd NLog.LoggerImpl.WriteToTargetWithFilterChain(NLog.Internal.TargetWithFilterChain, NLog.LogEventInfo, NLog.Common.AsyncContinuation)), calling (MethodDesc 00007fe67cdb8d38 + 0 NLog.Targets.Target.WriteAsyncLogEvent(NLog.Common.AsyncLogEventInfo))
00007FE140F27BF0 00007fe67d7115f6 (MethodDesc 00007fe67d454998 + 0x156 NLog.LoggerImpl.Write(System.Type, NLog.Internal.TargetWithFilterChain, NLog.LogEventInfo, NLog.LogFactory)), calling (MethodDesc 00007fe67d4549f8 + 0 NLog.LoggerImpl.WriteToTargetWithFilterChain(NLog.Internal.TargetWithFilterChain, NLog.LogEventInfo, NLog.Common.AsyncContinuation))
00007FE140F27C00 00007fe67d711378 (MethodDesc 00007fe67cdaaa78 + 0xc8 NLog.Logger.WriteToTargets(NLog.LogLevel, System.IFormatProvider, System.String)), calling (MethodDesc 00007fe67cdaa9a8 + 0 NLog.Logger.PrepareLogEventInfo(NLog.LogEventInfo))
00007FE140F27C40 00007fe67db48f14 (MethodDesc 00007fe67d3a9730 + 0x4b4 PlayAuth.Api.Middleware.HealthCheckMiddleware+<GetInternal>d__5.MoveNext()), calling (MethodDesc 00007fe67cdaabb8 + 0 NLog.Logger.Trace(System.String))
00007FE140F27EB0 00007fe67c7787f2 (MethodDesc 00007fe67c33d9e0 + 0x52 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Start[[System.__Canon, System.Private.CoreLib]](System.__Canon ByRef))
00007FE140F27EC0 00007fe67c778795 (MethodDesc 00007fe67c3640c0 + 0x45 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Create()), calling libcoreclr.so!JIT_ByRefWriteBarrier
00007FE140F27EF0 00007fe67db485c4 (MethodDesc 00007fe67d38da38 + 0xf4 PlayAuth.Api.Middleware.HealthCheckMiddleware.GetInternal(System.Collections.Generic.IList`1<PlayAuth.Common.Health.IHealthIndicator>)), calling (MethodDesc 00007fe67c33d9e0 + 0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Start[[System.__Canon, System.Private.CoreLib]](System.__Canon ByRef))
00007FE140F27F70 00007fe67db476d1 (MethodDesc 00007fe67d3a9308 + 0x311 PlayAuth.Api.Middleware.HealthCheckMiddleware+<Invoke>d__3.MoveNext()), calling (MethodDesc 00007fe67d38da38 + 0 PlayAuth.Api.Middleware.HealthCheckMiddleware.GetInternal(System.Collections.Generic.IList`1<PlayAuth.Common.Health.IHealthIndicator>))
00007FE140F28250 00007fe67c77b712 (MethodDesc 00007fe67c33ce98 + 0x52 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[System.__Canon, System.Private.CoreLib]](System.__Canon ByRef))
00007FE140F28260 00007fe67c77b6b5 (MethodDesc 00007fe67c364560 + 0x45 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Create()), calling libcoreclr.so!JIT_ByRefWriteBarrier
00007FE140F28290 00007fe67db4732f (MethodDesc 00007fe67d38da18 + 0x11f PlayAuth.Api.Middleware.HealthCheckMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext, System.Collections.Generic.IEnumerable`1<PlayAuth.Common.Health.IHealthIndicator>, Microsoft.Extensions.Options.IOptions`1<PlayAuth.Common.Configuration.EnvironmentConfig>)), calling (MethodDesc 00007fe67c33ce98 + 0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[System.__Canon, System.Private.CoreLib]](System.__Canon ByRef))
00007FE140F28320 00007fe67db25d6e (MethodDesc 00007fe67d520e20 + 0x22e Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext))
00007FE140F28410 00007fe67db24db4 (MethodDesc 00007fe67dcbb4b0 + 0x204 Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware+<Invoke>d__3.MoveNext())
00007FE140F28450 00007fe67db24b25 (MethodDesc 00007fe67dcbb618 + 0x45 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware+<Invoke>d__3, Microsoft.AspNetCore.Hosting]](<Invoke>d__3 ByRef)), calling (MethodDesc 00007fe67dcbb4b0 + 0 Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware+<Invoke>d__3.MoveNext())
00007FE140F28460 00007fe67d7a97b7 (MethodDesc 00007fe67da502e8 + 0x57 DomainBoundILStubClass.IL_STUB_InstantiatingStub(Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, System.Object, System.Exception, System.Func`3<System.Object,System.Exception,System.String>))
00007FE140F28490 00007fe67db24aaa (MethodDesc 00007fe67d4042c0 + 0x7a Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)), calling (MethodDesc 00007fe67dcbb618 + 0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware+<Invoke>d__3, Microsoft.AspNetCore.Hosting]](<Invoke>d__3 ByRef))
00007FE140F28520 00007fe67db24a11 (MethodDesc 00007fe67da4d530 + 0x11 Microsoft.AspNetCore.Hosting.Internal.HostingApplication.ProcessRequestAsync(Context))
00007FE140F28530 00007fe67db12226 (MethodDesc 00007fe67dc9d890 + 0x736 Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Frame`1+<RequestProcessingAsync>d__2[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]].MoveNext())
00007FE140F28620 00007fe67db10d25 (MethodDesc 00007fe67dc9ddf0 + 0x45 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Frame`1+<RequestProcessingAsync>d__2[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]], Microsoft.AspNetCore.Server.Kestrel]](<RequestProcessingAsync>d__2<Context> ByRef)), calling (MethodDesc 00007fe67dc9d890 + 0 Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Frame`1+<RequestProcessingAsync>d__2[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]].MoveNext())
00007FE140F28660 00007fe67db10534 (MethodDesc 00007fe67dc852e0 + 0x94 Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Frame`1[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]].RequestProcessingAsync()), calling (MethodDesc 00007fe67dc9ddf0 + 0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Frame`1+<RequestProcessingAsync>d__2[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]], Microsoft.AspNetCore.Server.Kestrel]](<RequestProcessingAsync>d__2<Context> ByRef))
00007FE140F28730 00007fe67c76bf8c (MethodDesc 00007fe67c362458 + 0x9c System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib]].InnerInvoke())
00007FE140F28760 00007fe67c7649c6 (MethodDesc 00007fe67c361800 + 0x46 System.Threading.Tasks.Task.Execute())
00007FE140F28790 00007fe67c714e71 (MethodDesc 00007fe67c357a40 + 0x71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
00007FE140F287A0 00007fe67c76501c (MethodDesc 00007fe67c361880 + 0x2ec System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)), calling libcoreclr.so!JIT_CheckedWriteBarrier
00007FE140F287E0 00007fe67c764fc7 (MethodDesc 00007fe67c361880 + 0x297 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)), calling (MethodDesc 00007fe67c357a40 + 0 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
00007FE140F288A0 00007fe67c764ce8 (MethodDesc 00007fe67c361870 + 0x98 System.Threading.Tasks.Task.ExecuteEntry(Boolean)), calling (MethodDesc 00007fe67c361880 + 0 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef))
00007FE140F288C0 00007fe67c77ad25 (MethodDesc 00007fe67c3643a0 + 0x1a5 System.Threading.ThreadPoolWorkQueue.Dispatch())
00007FE140F28920 00007fe6f5f53a37 libcoreclr.so!CallDescrWorkerInternal + 0x7c
00007FE140F28940 00007fe6f5e684bc libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 0x36c, calling libcoreclr.so!CallDescrWorkerInternal
00007FE140F28960 00007fe6f5e684c8 libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 0x378, calling libcoreclr.so!NativeExceptionHolderBase::~NativeExceptionHolderBase()
00007FE140F289A0 00007fe6f61c254f libcoreclr.so!UnicodeToUTF8 + 0xcf, calling libcoreclr.so!UTF8Encoding::GetBytes(char16_t*, int, unsigned char*, int)
00007FE140F28A00 00007fe6f5e212a8 libcoreclr.so!MetaSig::GetReturnTypeNormalized(TypeHandle*) const + 0x58, calling libcoreclr.so!SigPointer::PeekElemTypeClosed(Module*, SigTypeContext const*) const
00007FE140F28A20 00007fe6f5e212a8 libcoreclr.so!MetaSig::GetReturnTypeNormalized(TypeHandle*) const + 0x58, calling libcoreclr.so!SigPointer::PeekElemTypeClosed(Module*, SigTypeContext const*) const
00007FE140F28B00 00007fe6f5ff4b16 libcoreclr.so!QueueUserWorkItemManagedCallback(void*) + 0xc6, calling libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int)
00007FE140F28C20 00007fe6f5e384ee libcoreclr.so!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 0x1ae
00007FE140F28C30 00007fe6f5e385a3 libcoreclr.so!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 0x263, calling libcoreclr.so!Frame::PopIfChained()
00007FE140F28C40 00007fe6f5db02c1 libcoreclr.so!CExecutionEngine::GetTlsData() + 0x11, calling libcoreclr.so!__tls_get_addr
00007FE140F28D40 00007fe6f5e38ca6 libcoreclr.so!ManagedThreadBase::ThreadPool(ADID, void (*)(void*), void*) + 0x46, calling libcoreclr.so!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)
00007FE140F28D90 00007fe6f5fd5287 libcoreclr.so!ManagedPerAppDomainTPCount::DispatchWorkItem(bool*, bool*) + 0x127, calling libcoreclr.so!ManagedThreadBase::ThreadPool(ADID, void (*)(void*), void*)
00007FE140F28DE0 00007fe6f6c5a27d libc.so.6!__clock_gettime + 0x1d
00007FE140F28DF0 00007fe6f61cbbe3 libcoreclr.so!GetTickCount + 0x23, calling libcoreclr.so!clock_gettime
00007FE140F28E40 00007fe6f5fd4ae7 libcoreclr.so!PerAppDomainTPCountList::GetAppDomainIndexForThreadpoolDispatch() + 0x47
00007FE140F28E80 00007fe6f5e58a2c libcoreclr.so!ThreadpoolMgr::WorkerThreadStart(void*) + 0x4bc
00007FE140F28EF0 00007fe6f61eb1a9 libcoreclr.so!CorUnix::CPalThread::ThreadEntry(void*) + 0x129
00007FE140F28FB0 00007fe6f6c4d62d libc.so.6!clone + 0x6d

We are also using a custom renderer:

 [LayoutRenderer("correlationId")]
    public class CorrelationIdLayoutRenderer : LayoutRenderer
    {
        private IHttpContextAccessor _httpContextAccessor;

        private IHttpContextAccessor HttpContextAccessor => _httpContextAccessor ??
                                                            (_httpContextAccessor = ServiceLocator.ServiceProvider?.GetService<IHttpContextAccessor>());

        protected override void Append(StringBuilder builder, LogEventInfo logEvent)
        {
            var context = HttpContextAccessor?.HttpContext?.RequestServices?.GetService(typeof(LogContext)) as LogContext;

            if (context != null)
            {
                builder.Append(context.CorrelationId.ToString("N"));
            }
        }
    }

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (12 by maintainers)

Most upvoted comments

Yet another option, that only works for ASP.NET Core is to use HttpContext.TraceIdentifier:

https://github.com/NLog/NLog.Web/issues/92

Yet an alternative way would be to make use of MappedDiagnosticsLogicalContext (MDC):

https://tpodolak.com/blog/2016/07/05/nlog-tracking-flow-of-requests-using-mappeddiagnosticslogicalcontext/

Right now not available but it should come with #2025 (Will be released in 5.0 BETA 7).

Then it should be a matter of setting the request-correlation-id in the MDC on start-request, and then one can refer to it in the NLog Target Layout.