prisma: Unexpectedly high memory usage when executing large number of `upsert` statements

Bug description

The application I’m working on ingests bulk data and upserts it into a database using Prisma. The scale of this data is large, but not huge: approx 250,000 records, with each record being fairly small (6 strings, a float, and a datetime). We’re looking to run this process on as small hardware as possible, so I’ve been investigating running this in a memory constrained environment.

What I’ve found is that our application requires around 1.2GB of memory in order to perform this upsert process. Upon investigation, I found that if I temporarily disable the single line that actually upserts the data (prismaClient.upsert()), the memory usage drops dramatically to 400MB, implying Prisma is responsible for 2/3 of our memory usage.

Further analysis has shown that the memory jump occurs pretty rapidly, within the first 50,000 records or so, at which the memory usage reaches 1.2GB and stays roughly in that range for the rest of the processing. For this reason, I would not classify this as a “leak” - the memory usage does eventually plateau. However, you still need to upsert a fairly significant number of records to see the problem - only 1,000 or so won’t trigger it.

The following graph shows the JS Heap usage and the overall process memory usage (RSS), with the single upsert statement enabled and disabled:

image

As you can see, the JS heap usage actually looks pretty reasonable in both cases, but the process total memory is significantly higher with the statement enabled. This suggests the memory is likely being consumed by the Prisma Engine.

A couple of other tidbits of info:

  • I investigated changing the statement cache size; this did not make a significant difference; in fact it made the memory use slightly higher
  • Replacing the upsert with an update did not noticeably affect the memory usage either way
  • Replacing the upsert with an select also showed abnormally high memory usage but it was reduced to approximately 2/3 of what it previously was.
  • Records are inserted in parallel, but the parallelism is limited to 10. I assume this will result in at least 10 database connections. Removing the parallelism does reduce the memory usage, but the process is then unacceptably slow.
  • The node process is being run with --max-old-space-size=512 to ensure aggressive garbage collection. If this flag is removed, the same problem happens, but the graph above doesn’t look as nice due to GC lag.

Overall, I’d just love to know if I’m missing something entirely, if I’m nuts, etc, or if this seems like a real problem!

How to reproduce

  1. Create a program that upserts data in a loop
  2. Perform upserts in parallel with a maximum parallelism of 10
  3. Upsert 250k records (although it might be as small as 50k)
  4. Observe memory usage

Expected behavior

Memory usage should remain relatively low and not climb dramatically around 50k records. I would expect the cost of using Prisma to be no more than 200MB in this situation (correct me if I’m totally wrong about this). At minimum, I would not expect the memory usage to be related to the total number of records upserted.

Prisma information

(names of objects/fields altered for privacy)

generator client {
  provider        = "prisma-client-js"
  previewFeatures = ["referentialIntegrity"] 
}

model TheObject {
  id                String          @id
  alpha        String
  bravo          String
  charlie            String
  delta DateTime
  echo       String?
  foxtrot      Float?
  golf             String?

  @@index([bravo, charlie])
  @@map("the_object")
}

datasource db {
  provider             = "postgresql"
  url                  = env("DATABASE_URL")
  referentialIntegrity = "prisma"
}
    await prisma.$transaction(theObjects.map(object => {
        const {id, ...updatedObject} = object
    
        return prisma.theObject.upsert({
            where: {
                id
            },
            update: updatedObject,
            create: object
        })
    }))

(I also tried it without $transaction(), same result)

Environment & setup

  • OS: MacOS Monterey 12.6 (also observed on Amazon Linux 2)
  • Database: PostgreSQL 14.5
  • Node.js version: v14.19.2

Prisma Version

prisma                  : 4.7.1
@prisma/client          : 4.7.1
Current platform        : darwin
Query Engine (Node-API) : libquery-engine 272861e07ab64f234d3ffc4094e32bd61775599c (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli 272861e07ab64f234d3ffc4094e32bd61775599c (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core 272861e07ab64f234d3ffc4094e32bd61775599c (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt 272861e07ab64f234d3ffc4094e32bd61775599c (at node_modules/@prisma/engines/prisma-fmt-darwin)
Format Wasm             : @prisma/prisma-fmt-wasm 4.7.1-1.272861e07ab64f234d3ffc4094e32bd61775599c
Default Engines Hash    : 272861e07ab64f234d3ffc4094e32bd61775599c
Studio                  : 0.477.0
Preview Features        : referentialIntegrity

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 26 (9 by maintainers)

Most upvoted comments

I am facing the same problem. The application inserts about ~20k-50k rows into the database (PostgreSQL) every 15 minutes. The RSS memory goes up while the heap stays stable. Reducing the number of inserts helps memory usage.

Screenshot 2023-02-14 at 2 36 52 PM

This is my Prisma installation details:

prisma                  : 4.9.0
@prisma/client          : 4.9.0
Current platform        : darwin-arm64
Query Engine (Node-API) : libquery-engine ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5 (at ../../node_modules/@prisma/engines/libquery_engine-darwin-arm64.dylib.node)
Migration Engine        : migration-engine-cli ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5 (at ../../node_modules/@prisma/engines/migration-engine-darwin-arm64)
Introspection Engine    : introspection-core ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5 (at ../../node_modules/@prisma/engines/introspection-engine-darwin-arm64)
Format Binary           : prisma-fmt ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5 (at ../../node_modules/@prisma/engines/prisma-fmt-darwin-arm64)
Format Wasm             : @prisma/prisma-fmt-wasm 4.9.0-42.ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5
Default Engines Hash    : ceb5c99003b99c9ee2c1d2e618e359c14aef2ea5
Studio                  : 0.479.0

We’ve found some good news on this issue. We’ve tested a large number of upsert statements via v16, v18, and v20 NodeJS.

Prisma v4.16.2 image

Prisma v5.1.1 image

Looks like both Prisma versions have this memory issue on NodeJS v16 and do not have the issue any more in v18 and v20.

Same issue here. The affected application consistently upserts records in batches of 1000 and this is how the memory usage looks like (memory only goes down after the application runs out of memory and restarts):

Screenshot 2023-03-02 at 09 00 38

The problem only occurred after updating @prisma/client from 4.8.1 to 4.10.0.

I was able to reproduce it locally with this code and tested it with different versions of Prisma:

  async upsert(count: number) {
    for (let i = 1; i <= count; i++) {
      await Promise.all(
        ids.map(async (id) => {
          const data = { id, name: randomInt(100000).toString() };
          await this.prisma.test.upsert({
            where: { id: data.id },
            create: data,
            update: data,
          });
        })
      );
    }
  }

Here are the results:

Screenshot 2023-03-02 at 10 05 03

Environment (local)

  • OS: Windows 10
  • Database: PostgreSQL (13.6)
  • Node.js version: v18.12.1

Prisma Version (local)

prisma                  : 4.11.0
@prisma/client          : 4.11.0
Current platform        : windows
Query Engine (Node-API) : libquery-engine 8fde8fef4033376662cad983758335009d522acb (at node_modules\@prisma\engines\query_engine-windows.dll.node)
Migration Engine        : migration-engine-cli 8fde8fef4033376662cad983758335009d522acb (at node_modules\@prisma\engines\migration-engine-windows.exe)
Format Wasm             : @prisma/prisma-fmt-wasm 4.11.0-57.8fde8fef4033376662cad983758335009d522acb
Default Engines Hash    : 8fde8fef4033376662cad983758335009d522acb
Studio                  : 0.483.0

No.

We also had the problem of RSS increasing until it killing the server. We downgraded prisma and client to 4.8.1 as said above and didn’t observe such anymore. 스크린샷 2023-06-21 오전 10 22 32 The light blue line increasing in the first 1 day is RSS, others heap and external size. Started monitoring the memory and found the downgrade solution in a day, so the increase only shows once, but it kept doing it for long. The latter flat line is after downgrading to 4.8.1.

just in case more case are needed.

Hey folks! I was able to confirm the repro that @backend-sungjee linked and I was also able to confirm that that issue disappeared when updating from Node 16 -> Node 18+ Repro PR with data Here is some of the data from that PR:

Prisma v4.8.1 on Node v16.19.1

The memory graph looks as expected, stable and not growing Memory Usage Graph in MB Event Loop Delay Graph in MS

Prisma v4.9.0 on Node v16.19.1

Here we can see RSS continually growing which could be a memory leak There are also growing spikes in the event loop delay Memory Usage Graph in MB Event Loop Delay Graph in MS

Prisma v4.9.0 on Node v18.19.0

And here we can see that by migrating to Node 18. The issue no longer persists, the memory consumption is stable over time. Memory Usage Graph in MB Event Loop Delay Graph in MS

Cross-posting another set of results that @Jolg42 wrote-up here that were similarly solved by upgrading from Node 16 → Node 18+ https://github.com/prisma/prisma/issues/17925#issuecomment-1801976996


@abrindam can you confirm if you still see this issue when upgrading to Node 18+?

For the folks who have this issue persist on Node 18+, please make a new issue so that we can consolidate all relevant information in one spot. If possible that should ideally include a minimal reproduction of the suspected memory leak / high memory usage on Node 18+ i.e.

  • The code
  • The Prisma schema
  • The specific versions of Prisma this happens on (does this still happen on latest)
  • The specific Node.js version used
  • Does this happen locally or in conjunction with a platform

@janpio We primarily use the library engine. We only switched to the binary one to see whether the memory leak is really in the Prisma engine and not in our app. And the memory is steadily increasing regardless of the engine type.

We run our apps in DigitalOcean Apps Platform: Linux 4.4.0 Node.js v18.17.1 PostgreSQL 14

We previously used Node.js v16 and the problem was visible there as well.

@janpio

It’s my local environment

OS: mac os ventura(13.0) Node version: v16.16.0 db: postgresql 14.8

I’m sorry, but I haven’t been able to determine whether it’s unused memory or memory that is still allocated and in use

For our service, we were using AWS instances created by AWS elastic beanstalk, with the instance directly executing node. Node 18, Amazon Linux 2, and the DB was a AWS aurora mysql 8.0, 8.0.mysql_aurora.3.02.2 The beanstalk instances was either t4g or m6g, both being ARM arch.

Yes, there is discussion and a plan to work on fixing this, but no ETA when we get to it I could communicate - sorry.

What would be helpful is a standalone reproduction of the problem that we can easily run by following some instructions to see the problem in action. That will make it much easier for us to get started here.