runtime: dotnet-dump makes process to double its used memory and fails
Description
In a Kubernetes environment, we have a process that normally consumes around 3.8 Gi.
When we run dotnet-dump collect
, it causes the process to increase memory usage up to around 7.2 Gi.
Since we have a 6 Gi memory limit for the Pod, dotnet-dump
cannot finish dump generation and fails with a System.IO.EndOfStreamException: Unable to read beyond the end of the stream
exception.
If we set a higher memory limit, dotnet-dump collect
succeeded, approximately doubling the used memory.
Is this expected behavior? Is it possible to make it just save the dump to the file without consuming more memory?
Reproduction Steps
Run dotnet-dump collect --process-id 1
Expected behavior
A dump file is created
Actual behavior
Dump file generation failed and the process may be crashed
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 10
- Comments: 53 (34 by maintainers)
Good news, I wrote a PoC using
/proc/pid/pagemap
, the results are impressive!Only small increase in the libs size, no stacks or anon memory increase, the dump file size down from 171 MB to 33 MB for a test application. Tested
clrstack
,dumpheap -stat
, couple ofdo
withdotnet-dump analyze
, everything seems to work. I’ll cleanup my changes and make a PR next week.@afilatov-st Regarding a backport: I already did one for .NET 5 and will do for .NET 6 today. Note it’s only Linux binaries that were built with CentOS 7 docker image following this instruction. Feel free to cherry-pick it and compile yourself if you need something else 😃
To check that fallback happened you should run
dotnet-dump collect -p <pid> --diag
. It will print messages into the output of the application, then search forFAILED
I’ve been investigating this and figured out why createdump’s memory usage is increasing so much but I don’t have any fix yet. I haven’t come up with any work around other than creating “full” dumps or any fix especially one that will fit in our 7.0 schedule.
Indeed I have some concerns about permissions. From kernel pagemap doc:
Both kernels I tested on were > 4.2 so I was able to retrieve
present
andswapped
flags we need but got zeroed out PFN. I think to try open the file and if it fails fallback to the current behavior.Not any more, we were able to increase the max memory to more than double the initial setting and were able to get a dump.
Backport for .NET 6 based on 6.0.12:
Regarding the WSL 2 Ubuntu vs CentOS, there is definitely a difference in handling committed memory (the actual reason is not WSL 2, but the kernel version). We’ve seen difference in memory accounting between older and more recent kernel in #72067. As for the dump creation causing memory usage growth, I think it makes sense. We include memory ranges that were never touched before and so physical memory pages were not backing those. But once we read them to store them in the dump, we cause the allocation of the physical pages. One thing that we could try is to use the
mincore
function (https://man7.org/linux/man-pages/man2/mincore.2.html). For a given memory range, it extracts a bitmap of pages that are resident in memory. So we could use it as a filter to skip pages that are not resident. I guess there will be few gotchas for things like shared libraries pages, but maybe for those, we could just include them as their total size is going to be minimal.Hello, I’m facing the same issue with
createdump
increasing the memory usage of the application up to a point it fails with OOM. By comparing/proc/<pid>/smaps
content before and after the dump I split the memory usage increase (I mean VmRSS increase) into 3 categories: related to code of the libraries, related to thread stacks, and other anonymous regions.The most interesting finding is that there’s a difference between operating systems. On WSL2 Ubuntu I only get RssFile increase for the libraries regions:
However on production environment with CentOS I get:
I also built a custom version of createdump, where I could do some printf debugging. I don’t find any difference in the syscalls used on CentOS and Ubuntu, in both cases it’s
process_vm_readv
. The pattern in which the memory is read also seem to be the same: we first read 1 byte from every page of the region, then combine the regions, then read in 16K chunks. On both OSes for thread stacks 8MB are read, but on CentOS this read results in the 8MB block being committed in the parent process.One more thing to add, I think we should not concentrate on the thread stacks specifically. With the real app I found that while stack traces account for 2GB of the increase, we have 3.5GB of other anonymous regions committed. Probably related to memory usage of native libs like Kafka client, no idea how to find out what exactly they are.
While the issue is easily reproducible I don’t know what to look at further. Would you have any ideas what I could try? Thanks!
@tommcdon I confirmed that the swap is off.
Thanks for the answer. I doubt it because the app is run under a Kubernetes environment where the swap should be off. I’ll check with
getrusage
and get back with the results.