runtime: Memory leak with AWS SDK for .NET using .NET Core 3.1.300 on macOS 10.15 Catalina
Description
I’m running into a memory leak when using the AWS SDK for .NET to send data to the AWS Kinesis Data Stream endpoint. Looking at the memory leak report generated using macOS’s built-in leaks utility, part of the leak might be caused by System.Security.Cryptography.Native.Apple.dylib.
I have created a min-reproducible example for the memory leak, found in this repository. The operation requirements as well as detailed instructions can be found in the repository’s README.
On some machines, the memory usage of the app can grow to >1GB. On other machines, the memory usage does not grow noticeably, but the amount of leaked memory (determined by the leaks tool continues to grow. The leak in both cases has the same sources - I’ve verified this using the leaks tool output from both groups of machines. This high memory usage causes customers’ machines to slow down considerably and affect productivity.
Configuration
- .NET Core version: 3.1.300
- OS version: macOS Catalina 10.15.5
- Architecture: x64
- Configuration-specific: No, the memory leak has been seen on macOS 10.14 Catalina as well, with .NET Core versions 3.0, 3.1.1, and 3.1.2.
Regression?
Not sure - we have seen this issue previously on .NET Core 3.0, but there was only one reported incident. It looks like the leak occurs even if the total memory usage doesn’t grow noticeably, so it’s possible that this just wasn’t noticed in previous releases of our app.
Other information
The leak report for the min-reproducible example can be found in the project README. I’m including the sections responsible for the most memory usage (found in the leak reports from other affected systems) below. There are two main sections: SSL handshake attempt
STACK OF 1 INSTANCE OF 'ROOT LEAK: <SecTrust>':
42 libsystem_pthread.dylib 0x7fff6b1cbb8b thread_start + 15
41 libsystem_pthread.dylib 0x7fff6b1d0109 _pthread_start + 148
40 libcoreclr.dylib 0x1010c12a4 CorUnix::CPalThread::ThreadEntry(void*) + 436
39 libcoreclr.dylib 0x10126db8f ThreadpoolMgr::WorkerThreadStart(void*) + 1311
38 libcoreclr.dylib 0x101240154 ManagedPerAppDomainTPCount::DispatchWorkItem(bool*, bool*) + 276
37 libcoreclr.dylib 0x101249b20 ManagedThreadBase::ThreadPool(void (*)(void*), void*) + 32
36 libcoreclr.dylib 0x101249503 ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 323
35 libcoreclr.dylib 0x1012a3538 QueueUserWorkItemManagedCallback(void*) + 184
34 libcoreclr.dylib 0x101288639 MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 1657
33 libcoreclr.dylib 0x10143c8fb CallDescrWorkerInternal + 124
32 ??? 0x109384766 0x7fffffffffffffff + 9223372041304426343
31 ??? 0x10924ed8d 0x7fffffffffffffff + 9223372041303158158
30 ??? 0x109ee4798 0x7fffffffffffffff + 9223372041316353945
29 ??? 0x109ee629a 0x7fffffffffffffff + 9223372041316360859
28 ??? 0x10937930d 0x7fffffffffffffff + 9223372041304380174
27 ??? 0x10938423e 0x7fffffffffffffff + 9223372041304425023
26 ??? 0x108f57eed 0x7fffffffffffffff + 9223372041300049646
25 ??? 0x107ef0400 0x7fffffffffffffff + 9223372041282847745
24 ??? 0x109397586 0x7fffffffffffffff + 9223372041304503687
23 ??? 0x10939905c 0x7fffffffffffffff + 9223372041304510557
22 ??? 0x1093992bb 0x7fffffffffffffff + 9223372041304511164
21 ??? 0x107ef0400 0x7fffffffffffffff + 9223372041282847745
20 ??? 0x109397586 0x7fffffffffffffff + 9223372041304503687
19 ??? 0x10939905c 0x7fffffffffffffff + 9223372041304510557
18 ??? 0x1093992bb 0x7fffffffffffffff + 9223372041304511164
17 ??? 0x107ef0400 0x7fffffffffffffff + 9223372041282847745
16 ??? 0x109397452 0x7fffffffffffffff + 9223372041304503379
15 ??? 0x109397631 0x7fffffffffffffff + 9223372041304503858
14 ??? 0x109397af5 0x7fffffffffffffff + 9223372041304505078
13 ??? 0x10939894e 0x7fffffffffffffff + 9223372041304508751
12 ??? 0x109398ac2 0x7fffffffffffffff + 9223372041304509123
11 ??? 0x10934c526 0x7fffffffffffffff + 9223372041304196391
10 System.Security.Cryptography.Native.Apple.dylib 0x101e816ee 0x101e7d000 + 18158
9 com.apple.security 0x7fff3da8c909 SSLHandshake + 185
8 com.apple.security 0x7fff3da8ca29 SSLHandshakeProceed + 185
7 libcoretls.dylib 0x7fff68609999 tls_handshake_process + 85
6 libcoretls.dylib 0x7fff6860a0eb SSLProcessHandshakeRecordInner + 219
5 com.apple.security 0x7fff3dcacdac tls_verify_peer_cert + 71
4 com.apple.security 0x7fff3dcaccff sslCreateSecTrust + 47
3 libcoretls_cfhelpers.dylib 0x7fff6861c23e tls_helper_create_peer_trust + 222
2 com.apple.security 0x7fff3da5ff43 SecTrustCreateWithCertificates + 918
1 com.apple.CoreFoundation 0x7fff310e9663 _CFRuntimeCreateInstance + 597
0 libsystem_malloc.dylib 0x7fff6b181d9e malloc_zone_malloc + 140
and thread start code.
STACK OF 10 INSTANCES OF 'ROOT LEAK: malloc<144>':
7 libsystem_pthread.dylib 0x7fff6b1cbb8b thread_start + 15
6 libsystem_pthread.dylib 0x7fff6b1d0109 _pthread_start + 148
5 libcoreclr.dylib 0x1010c12a4 CorUnix::CPalThread::ThreadEntry(void*) + 436
4 libcoreclr.dylib 0x10126fcc6 ThreadpoolMgr::GateThreadStart(void*) + 118
3 libcoreclr.dylib 0x1012e53a6 EETlsSetValue(unsigned int, void*) + 22
2 libcoreclr.dylib 0x1011938cd CExecutionEngine::CheckThreadState(unsigned int, int) + 61
1 libcoreclr.dylib 0x10109a698 HeapAlloc + 40
0 libsystem_malloc.dylib 0x7fff6b181d9e malloc_zone_malloc + 140
I have opened https://github.com/aws/aws-sdk-net/issues/1629 on the GitHub repo for the AWS SDK for .NET. However, because these two stack traces seem to point to .NET SDK native code, I have opened this issue directly with the .NET repo as well.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 36 (24 by maintainers)
fixed for 5.0 with #41989
FYI: This is now fixed in 6.0 in PR #41657 (it was merged on 9/1). We are discussing options to backport it into 5.0.
yes, essentially this is what I was looking for. With that and the #41657 fix I can get clean run:
I took a look at the repro. Just to make sure, I set up a Kinesis stream in AWS with valid IAM credentials that could PUT to it.
One thing that I noticed in the repro is that there is a task that is not being awaited here.
https://github.com/rharpavat/kinesis-dotnet-macos-memoryleak/blob/416b01bf3f322c4ba9df53ff1eba5087db796b93/Program.cs#L35
So the SendRecord was essentially a fire-and-forget, but the
Thread.Sleepprevented it from growing very rapidly.I fixed up the example to use async / await and I was able to publish a few thousand messages to Kinesis, and memory usage never exceeded 30 MB on macOS 10.15. You can see the exact code I ran here (minus the IAM credentials of course).
https://github.com/vcsjones/kinesis-dotnet-macos-memoryleak/commit/e9a3942cb77956d868b320328424514d2c350998
I’m not familiar enough with async / await to know if a missing await (fire and forget) could leak memory in circumstances.