runtime: Segmentation fault (core dumped) started popping a lot since upgrading to .NET 7

Description

Since migrating to .NET 7, we have started getting a lot of Segmentation fault (core dumped) crashes in our ASP.NET Core applications.

Unlike other issues here, .NET was not installed with snap or any other tool distribution tools, since we are using the official .NET docker:

dotnet/aspnet:7.0-jammy after we migrated from dotnet/aspnet:6.0.15-jammy.

dmesg yielded this result:

[ 5198.137895] dotnet[6138]: segfault at 7f365c282000 ip 00007f36d250b99e sp 00007ffd7ece7738 error 4 in libc.so.6[7f36d2385000+195000] [ 5198.145939] Code: 00 00 00 00 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 83 fa 20 77 36 b9 ff ff ff ff c4 e2 68 f5 c9 c5 fb 92 d1 62 e1 7f 2a 6f 16 <62> f3 6d 22 3e 0f 04 c5 fb 93 c1 85 c0 75 03 c3 66 90 f3 0f bc c0

Let me know if there is any additional info I can gather that will help you debug the issue.

Reproduction Steps

The only thing our dockerfile has, is installation of newrelic’s agent (which was there also in .NET 6).

# Newrelic agent
RUN apt-get update && apt-get install -y wget ca-certificates gnupg curl \
&& echo 'deb http://apt.newrelic.com/debian/ newrelic non-free' | tee /etc/apt/sources.list.d/newrelic.list \
&& wget https://download.newrelic.com/548C16BF.gpg \
&& apt-key add 548C16BF.gpg \
&& apt-get update \
&& apt-get install -y newrelic-dotnet-agent \
&& rm -rf /var/lib/apt/lists/*

ENV CORECLR_ENABLE_PROFILING=1 \
CORECLR_PROFILER={36032161-FFC0-4B61-B559-F6C5D41BAE5A} \
CORECLR_NEWRELIC_HOME=/usr/local/newrelic-dotnet-agent \
CORECLR_PROFILER_PATH=/usr/local/newrelic-dotnet-agent/libNewRelicProfiler.so
# Newrelic agent

Regression?

We did see some issues with dotnet/aspnet:6.0.15-jammy, but it was once or two a week, now it’s dozens a day.

Known Workarounds

No response

Configuration

No response

Other information

No response

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Comments: 20 (13 by maintainers)

Most upvoted comments

@mangod9 can you please guide us on how to collect these dumps on k8s and docker? documentation on collecting crash dumps is here: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/collect-dumps-crash

In addition to the above, in case you are running in kubernetes there are a couple more things to consider:

  1. You might need to increase container memory limits. Once .NET is configured to collect a dump on crash, it will launch createdump inside of the container causing both space as well as memory will increase. We suggest temporarily increasing the container memory limits to address this as needed.
  2. If your Kubernetes system is using liveness probes to monitor container health, you might find that it terminates the container while we are collecting a dump as the service becomes non-responsive once it crashes. For example, depending on the size of the process, it might take a while for createdump to complete and result in the health monitoring system to terminate the container. If this is happening, we suggest making the liveness probes less aggressive.
  3. We suggest mounting non-ephemeral storage to the container so that the dump can be retrieved after it crashes. The .NET crash dump environment variables allow you to configure the location, but that location will always be written to a directory inside of the container. If the file system is destroyed after the dump is created, then we won’t be able to download the dump. A couple options include a) use a host directory into the container, then scp the dump from the node, or b) use a network-based storage solution, such as azure storage to egress the dump from kubernetes to a location where it can be later retrieved.

Hope this helps!

Can you try removing the newrelic profiler and see if the issue still persists?