runtime: How to debug StackOverflowException

@Daniel15 commented on Wed Oct 25 2017

I’m getting this error while moving a site from ASP.NET Core 1.1 on Mono to ASP.NET Core 2.0 on .NET Core 2.0:

dbug: Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker[2]
      Executed action method Daniel15.Web.Controllers.ShortUrlController.Index (Daniel15.Web), returned result Microsoft.AspNetCore.Mvc.ContentResult.
Process is terminating due to StackOverflowException.
[1]    12976 abort      LD_LIBRARY_PATH=/tmp/ssltest ASPNETCORE_ENVIRONMENT=Development =

How do I get a full stack trace for the StackOverflowException to determine where it’s coming from?

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 14
  • Comments: 21 (11 by maintainers)

Most upvoted comments

@cdmihai Presumably at this point it would be hard to print the stack trace (there is no stack with which to work, after all). But I want to join in and comment that anything would be good here. Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

I guess managed code run on native thread stack.

That’s right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that’s necessary to dump the stack trace. But since we’ve recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace. I’ve created dotnet/runtime#825 assigned to myself to track it.

In other words, like the CoreCLR allocates an OutOfMemoryException instance upfront, we can allocate some space (1KB should be more than enough) and do that there?

I’d like to add that when running ASP.NET Core in an Azure App Service it’s even more painful because the EventLog.xml file that Azure App Services maintains for you doesn’t record any mention of the process being killed due to a stack-overflow. That’s maddening. This means that every unexpected stack-overflow causes 2-3 hours of figuring out “why isn’t the website working?” because there’s no indication the entire process is crashing in the first place.

It seems in Azure the only solution is to enable short-term crash monitoring, then reproduce the issue (assuming you can even consistently and reliably reproduce it in the first place!), then download the multi-gigabyte-sized .dmp file that Azure Portal saves to your blob storage account, and then wait over 30 minutes for Visual Studio to chew through the .dmp file (all while VS shows an ugly pop-up informing me that a background process is “taking too long” and only giving me a (very tempting) “Terminate” button…

So I’d describe the issue more broadly as: the overall developer UX for diagnosing and investigating stack-overflow crashes in .NET Core is abysmal and this is especially disappointing given Microsoft has a generally good reputation for developer-tooling - and we never had this problem in .NET Framework 1.x, where we could at least catch( StackOverflowException ).


Out of curiosity (and I know it’s off-topic), but why doesn’t EventLog.xml record app-crashes due to stack-overflows?

Got this from the console …

Api> Route matched with {action = “Get”, controller = “App”}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get(Microsoft.AspNet.OData.Query.ODataQueryOptions`1[Core.Objects.Entities.CMS.App]) on controller Api.Controllers.AppController (Api). Api> Api> Process is terminating due to StackOverflowException.

Put a breakpoint in the action … it’s not getting that far … so how do I debug stack overflows in DI ?

That’s right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that’s necessary to dump the stack trace. But since we’ve recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.

Where is the stacktrace dumped to, standard err/output? I am debugging in an orchestrated containerized environment, when app crashes because of StackOverFlowException the containers goes away and all is left is stderr and stdout,
2019-02-28T14:33:34.98-0500 [APP/PROC/WEB/0] ERR Process is terminating due to StackOverflowException. What’s the best way to debug SOFE in this kind of environment.

Does using windbg and SOS still work with core?

As described here: https://stackoverflow.com/a/49882734/684096

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run “clrstack -f”.