prisma: Memory leaks in Jest when running tests serially with `nApi` enabled
Bug description
Background
Currently trying to migrate to the library engine for an existing project (>1kloc prisma schema). After enabling the nApi
flag, I’ve noticed that Jest will periodically crash while running test suites serially (hundreds of tests spread across almost 30 test suites).
After taking a closer look using jest’s --logHeapUsage
flag, I’ve found that memory usage goes up by 100MB+ per test suite. About thirds into a test run, jest would be eating up over 2GB of memory and crash soon after.
Limitations
Unfortunately, I’ve found no successful mechanism that would allow running tests in parallel in an isolated environment when using Prisma. I’ve tried setting up a test environment that creates temporary schemas (since Prisma doesn’t seem to allow the usage of pg_temp
) and applies migrations for each suite, but haven’t achieved desirable results.
The problem
Instantiating a library engine leads to memory leaks when using jest (barebones example), which is noticeable when running tests using the --runInBand
flag. The issue also gets picked up when using --detectLeaks
.
I’ve also tested a version of the library engine with logging disabled (repo) and did not see the issue: neither on a simple instantiation(barebones example), nor when using it in a generated prisma client (by manually replacing the path in node_modules/@prisma/client/runtime/index.js
).
How to reproduce
Minimal reproduction repo
https://github.com/driimus/prisma-leaks - see the action runs for logs
Steps:
- Enable the
nApi
feature - Run some Jest test suites using the
--runInBand
or-w 1
flag and monitor memory usage (e.g. by also using the--logHeapUsage
flag) - Note the memory usage going up.
- When running large test suites, it is noted that the runner may crash:
<--- Last few GCs --->
[594:0x5d8b870] 327150 ms: Scavenge (reduce) 1912.3 (2075.9) -> 1912.0 (2076.4) MB, 3.8 / 0.0 ms (average mu = 0.394, current mu = 0.400) allocation failure
[594:0x5d8b870] 327155 ms: Scavenge (reduce) 1912.7 (2076.4) -> 1912.3 (2076.4) MB, 3.0 / 0.0 ms (average mu = 0.394, current mu = 0.400) allocation failure
[594:0x5d8b870] 327162 ms: Scavenge (reduce) 1913.0 (2076.4) -> 1912.6 (2076.9) MB, 3.0 / 0.0 ms (average mu = 0.394, current mu = 0.400) allocation failure
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb02cd0 node::Abort() [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
2: 0xa1812d node::FatalError(char const*, char const*) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
3: 0xceb72e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
4: 0xcebaa7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
5: 0xeb5485 [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
6: 0xeb5f74 [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
7: 0xec43e7 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
8: 0xec779c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
9: 0xe89d25 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
10: 0xe82934 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
11: 0xe84630 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
12: 0x110da42 v8::internal::String::SlowFlatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ConsString>, v8::internal::AllocationType) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
13: 0x1097e77 v8::internal::JSRegExp::Initialize(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, v8::base::Flags<v8::internal::JSRegExp::Flag, int>, unsigned int) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
14: 0x10987ff v8::internal::JSRegExp::Initialize(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::String>) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
15: 0x120b798 v8::internal::Runtime_RegExpInitializeAndCompile(int, unsigned long*, v8::internal::Isolate*) [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
16: 0x15cddf9 [/home/driimus/.nvm/versions/node/v16.7.0/bin/node]
Aborted
Expected behavior
No response
Prisma information
Schema: https://github.com/driimus/prisma-leaks/blob/main/prisma/schema.prisma
Environment & setup
- OS:
macOS
,debian
- Database:
PostgreSQL
- Node.js version:
v16.7.0
,LTS
Prisma Version
prisma : 2.30.0
@prisma/client : 2.30.0
Current platform : debian-openssl-1.1.x
Query Engine (Binary) : query-engine 60b19f4a1de4fe95741da371b4c44a92f4d1adcb (at node_modules/@prisma/engines/query-engine-debian-openssl-1.1.x)
Migration Engine : migration-engine-cli 60b19f4a1de4fe95741da371b4c44a92f4d1adcb (at node_modules/@prisma/engines/migration-engine-debian-openssl-1.1.x)
Introspection Engine : introspection-core 60b19f4a1de4fe95741da371b4c44a92f4d1adcb (at node_modules/@prisma/engines/introspection-engine-debian-openssl-1.1.x)
Format Binary : prisma-fmt 60b19f4a1de4fe95741da371b4c44a92f4d1adcb (at node_modules/@prisma/engines/prisma-fmt-debian-openssl-1.1.x)
Default Engines Hash : 60b19f4a1de4fe95741da371b4c44a92f4d1adcb
Studio : 0.422.0
Preview Features : nApi
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 15
- Comments: 43 (18 by maintainers)
I had a look at this today. I think this is happening because we keep a collection of engines around here https://github.com/prisma/prisma/blob/f395abad8eb85739a1df35071caa9e5050993696/packages/engine-core/src/library/LibraryEngine.ts#L44
And each time we create a new PrismaClient we create a new engine. This leads to a duplication of the whole engine which creates a lot of memory. I think we need to change the way we create the Library engine to only use one engine.
Any news? This is quite a critical bug for anyone that uses Prisma in prod with a CI pipeline
I can confirm my company’s Jest CI tests started leaking like crazy after upgrading to Prisma 3.x. This was the only changed variable so there’s definitely something happening. I am trying to reproduce with a simpler setup.
@Ustice thanks for looking into it a bit. The prisma client team has been doing some deep diving into this. We have found a few things. Using the engine or mocking it out doesn’t affect the heap size that much. We have tried a few things around the loading and removing of the engine and it made no difference. I also created another test sample and only loaded the engine without the typescript side and then there was no heap memory increase.
So we looking in a few places to solve this. We have some ideas and doing a fair amount of experimenting with different ideas to fix this.
@garrensmith not sure why that would be an issue given that jest sandboxes modules between test files.
@aqrln Those observations are somewhat correct in the sense that running
--logHeapUsage
without exposing node’s garbage collector won’t surface any abnormal usage patterns for very small sample sizes (n=10
in the case of your fork) - because jest won’t be able to free up unused memory as soon as possible (https://github.com/facebook/jest/blob/9edcaaffd1cdf418feca6d983cdfd364b73bf62c/packages/jest-runner/src/runTest.ts#L320).As a follow up from my previous comment:
Looks like the prisma client’s in-memory size scales with the schema, which would explain why the minimal reproduction didn’t originally exhibit as severe a increase in heap size.
I’ve updated the original repo’s README with some additional metrics that showcase the issue in more detail. Pasting the important ones here:
Comparison against binary engine for small schema
Comparison against the modified library engine with logging disabled
The setup was as minimal as possible:
prisma.$connect
to instantiate the engine, no other operationsI’ve also checked the modified engine into version control, in case anyone wants to give GH actions a spin: https://github.com/driimus/prisma-leaks/tree/bench/libengine-without-logging
We are experiencing very high memory usage too, actually 1.4gb after just a single test suite (around 60 integration tests).
I like the idea of
$dispose
. Lets see what others have to say here.@janpio
Not as of yet. I’m waiting on #4703 to explore test parallelization in more detail, since re-applying migrations plays a big part in using separate database schemas per test suite.
It’s a fairly big assumption on my part, but yes, I think the logging component causes leaks in
jest
test suites.My understanding of Node-API and the query engine is very limited, but this line in particular was the main reason I started looking at how
jest
reacts to getting rid of the logging callback. This was entirely based on the documentation for napi_unref_threadsafe_function, which states:As that seemed like something that the weak-napi package used by
jest
might detect.@pimeys I’ve played around a fair bit with instantiating a
QueryEngine
when I was testing changes to my stubbed out version in that repo:None of those have made a difference as far as the
--detectLeaks
flag is concerned.More importantly:
I’ve made two more changes to the minimal reproduction repo, one for importing a client, one for removing explicit disconnects. Hopefully the additional scenarios provide more insight.
Please open a new issue and provide additional information. Your leak is 99.9% not the same we fixed via this issue a year ago. Thanks.
We are experiencing memory leaks with:
18
4.11.0
When we inspect our heap its littered with incrementally expanding Prisma strings. This crashes our CI pipeline.
I ran some tests on my machines using the reproduction repo by @driimus: https://github.com/aqrln/prisma-leaks/blob/main/stats.md.
As @janpio noted in https://github.com/prisma/prisma/issues/8989#issuecomment-910711794, there doesn’t seem to be visible correlation between memory usage and N-API in this specific test: both library and binary engines produce similar numbers. It also looks like the memory might have been leaking in tests even before 3.0.0, it just wasn’t as noticeable. However, it does look like the memory usage has become worse in the recent versions on Linux, regardless of whether N-API is used or not, and it didn’t just happen once around 3.0.0: the numbers are even higher on 3.6.0.
In addition to that, Node.js version also affects numbers a lot. The drastic increase in memory usage with Prisma 3.6.0 on Linux seems to start happening on Node.js 16 and later (https://github.com/aqrln/prisma-leaks/blob/main/stats-by-version.md), which is consistent with the fact that the issue only became visible on our CI once we added Node.js 16 to the matrix: https://github.com/prisma/prisma/issues/10183.
The next steps would be to actually profile JavaScript heap allocations and memory usage and see what’s happening there.
We can reproduce the memory leak in CI (GitHub Actions) with Node.js v16 here https://github.com/prisma/prisma/pull/10020
We are having memory leaks in our pipeline too
I haven’t tried any real v4 tests yet, but here’s a test run for v4.1, which pertains to this specific issue https://github.com/driimus/prisma-leaks/actions/runs/2699611305 (take your pick of node version, schema size, test file count).
Unless you have a significant amount of test files, the problem might lie somewhere else in your tests, as the loss now seems to be insignificant at ~2-3MB per file.
@ethancadoo are you running jest with
--expose-gc
flag by any chance? If so, there is a separate issue on v8 side, try running the tests without this flag. If it you are not using it (or removing it does not help), please, provide a reproduction — cases, reported here should be fixed as of4.1.0
.I’m running isolated tests in prisma using vitest and https://www.npmjs.com/package/@chax-at/transactional-prisma-testing It mostly works
Thanks guys for your hard work on this issue
Any updates? I currently suffer from memory leaks on my server written in Nest.js when I run tests. GitHub Actions is crashing.
@garrensmith Thanks for pointing out where the problem was. This issue was blocking our adoption of Prisma. I created a PR for a fix. It refactors the way that the
beforeExit
is handled so that there aren’t any undead references to the engines., allowing garbage collection.There are failing tests in
src/__tests__/MigrateDiff.test.ts
, but it looks like that is a bugged test? I’m not familiar enough with the overall codebase to know.@driimus I’m pretty sure that the real leak is from here. There are references to the engines in
process
event handlers. While jest isolates modules (mostly), references toprocess
can and do leak.On my app it happens in Jest in
NODE_ENV=test
The first commit only helped get past
--detectLeaks
errors when instantiating a query engine. The second one got the tests in https://github.com/driimus/prisma-leaks to pass without an increase in heap usage.I haven’t gotten as far as monitoring
nApi
behavior in a live environment. I’d consider minor leaks for long-running clients to be a separate issue, unless there’s a correlation between them and the amount of data being transferred in and out. As for explicitly calling$disconnect
: if that’s the intended method of marking the engine for garbage collection, I could see that being a problem for Jest users (see 4).Pretty much.
As far as I know, Jest resets imported modules between test files (there’s also an even more aggressive sandboxing mechanism available through resetModules - though I haven’t checked whether that could impact the issue’s severity). Even if an instance of the Prisma Client is imported, there will be a separate copy per test file. I’m still unsure of what the ~100mb of leaked memory consists of, for the larger project. I’ll try to set up a more realistic example to see if I can get a definitive answer.
@pimeys Thanks for mention, I will dig into this problem in this week.