prisma: MongoDB `findUnique()` or `findUniqueOrThrow()` leads to COLLSCAN if multiple queries happen at the same time

Bug description

in mongodb a simple findUnique or findUniqueOrThrow can lead to a colscan if multiple queries to the same collection happen at the same time:

prisma.user
    .findUnique({
      where: {
        id: "some-id"
      },
      select: {
        roles: true,
      },
    })

prisma.user
    .findUnique({
      where: {
        id: "some-id-2"
      },
 
    })

it creates this query:


[
      {
        "$match": {
          "$expr": {
            "$and": [
              {
                "$in": [
                  "$_id",
                  {
                    "$literal": [
                      "some-id",
                      "some-id-2"
                    ]
                  }
                ]
              },
              {
                "$ne": [
                  "$_id",
                  "$$REMOVE"
                ]
              }
            ]
          }
        }
      },
      {
        "$project": {
          "_id": 1,
          //...
        }
      }
    ],
      

which leads to a colscan because of this bug: https://github.com/prisma/prisma/issues/21453 which can take several seconds depending on the collection size and load

while this query which you would expect takes only some miliseconds:

{
    
          "$expr": {
            "$and": [
              {
               "_id": "some-id"
              },
              {
                "$ne": [
                  "$_id",
                  "$$REMOVE"
                ]
              }
            ]
          }
        
}

findUnique or findUniqueOrThrow do not have this problem

How to reproduce

The error happens if there are multiple queries at the same time

i assume this is because an attempted optimization in prisma: prisma batches multiple queries on the same collection and this leads to a id: {$in: [..]} aggregation. This is slow in prisma https://github.com/prisma/prisma/issues/21453

This repo here shows the behavior: https://github.com/macrozone/prisma-mongodb-collscan-repro

Expected behavior

findUnique and findUniqueOrThrow never do a colscan. They are expected to be fast and efficient

Prisma information

see above

Environment & setup

  • OS: macos
  • Database: mongodb atlas
  • Node.js version: 18

Prisma Version

prisma                  : 5.9.1
@prisma/client          : 5.9.1
Computed binaryTarget   : darwin-arm64
Operating System        : darwin
Architecture            : arm64
Node.js                 : v20.11.0
Query Engine (Node-API) : libquery-engine 23fdc5965b1e05fc54e5f26ed3de66776b93de64 (at node_modules/@prisma/engines/libquery_engine-darwin-arm64.dylib.node)
Schema Engine           : schema-engine-cli 23fdc5965b1e05fc54e5f26ed3de66776b93de64 (at node_modules/@prisma/engines/schema-engine-darwin-arm64)
Schema Wasm             : @prisma/prisma-schema-wasm 5.9.0-32.23fdc5965b1e05fc54e5f26ed3de66776b93de64
Default Engines Hash    : 23fdc5965b1e05fc54e5f26ed3de66776b93de64
Studio                  : 0.497.0

About this issue

  • Original URL
  • State: open
  • Created 5 months ago
  • Reactions: 3
  • Comments: 28 (8 by maintainers)

Most upvoted comments

We are facing similar issues with these non-optimized queries. Some of our collection grew a lot in the recent weeks and now these COLLSCAN became a serious issue. A simple findUnique is taking +30 seconds to resolve on a 2.5mi records collection.

By enabling the debug on Prisma queries, we can tell all of it is due to the misusage of $expr + $in. Seems like any Prisma query will lazily just wrap your query with the $expr command instead of building a more specific query.

As a workaround, we manage to replace some of our queries with a Prisma OR operator.

So instead of something like

prisma.users.findMany({ where: { id: { in: sortedMemberIds } })

We replaced with

prisma.users.findMany({ where: { OR: sortedMemberIds.map((id) => ({ id })) }})

It’s a silly workaround and doesn’t work for all the cases, but it helps for now. The problem is even worse in our project since we expose a GraphQL endpoint, and rely on Prisma/Nexus/PalJS generated resolvers. All of those will generate the offender queries. To mitigate, we introduced custom resolvers that will do the workaround query described above. It’s a huge drawback to the project, as we need to handle each individual resolver manually now.

From quickly scanning Prisma engine codebase, I think calls like these here are the culprit (I’m nowhere near to understand Rust code 😅). Instead of just passing the query to the $expr command, it should be building the mongodb specific query depending on the requested filters. Not a trivial task, I guess, but it’s what I’d expect from a data access library.

Overall, we had a bunch of drawback with the Prisma + MongoDB integration. We manage to work around some of them, but it’s becoming unsustainable. It was first introduced because we wanted to better integrate with our new GraphQL services and not having to write resolvers manually for everything. As it stands now, we are back on writing new code with mongoose.

I checked the logs again from the production, made sure that it does not run any old version of the code where findUnique or findUniqueAndThrow is used and still see quite a lot of COLLSCANS from $in-queries on the primary key. I created therefore a separate issue https://github.com/prisma/prisma/issues/23775

We are facing similar issues with these non-optimized queries. Some of our collection grew a lot in the recent weeks and now these COLLSCAN became a serious issue. A simple findUnique is taking +30 seconds to resolve on a 2.5mi records collection.

By enabling the debug on Prisma queries, we can tell all of it is due to the misusage of $expr + $in. Seems like any Prisma query will lazily just wrap your query with the $expr command instead of building a more specific query.

As a workaround, we manage to replace some of our queries with a Prisma OR operator.

So instead of something like

prisma.users.findMany({ where: { id: { in: sortedMemberIds } })

We replaced with

prisma.users.findMany({ where: { OR: sortedMemberIds.map((id) => ({ id })) }})

It’s a silly workaround and doesn’t work for all the cases, but it helps for now. The problem is even worse in our project since we expose a GraphQL endpoint, and rely on Prisma/Nexus/PalJS generated resolvers. All of those will generate the offender queries. To mitigate, we introduced custom resolvers that will do the workaround query described above. It’s a huge drawback to the project, as we need to handle each individual resolver manually now.

From quickly scanning Prisma engine codebase, I think calls like these here are the culprit (I’m nowhere near to understand Rust code 😅). Instead of just passing the query to the $expr command, it should be building the mongodb specific query depending on the requested filters. Not a trivial task, I guess, but it’s what I’d expect from a data access library.

Overall, we had a bunch of drawback with the Prisma + MongoDB integration. We manage to work around some of them, but it’s becoming unsustainable. It was first introduced because we wanted to better integrate with our new GraphQL services and not having to write resolvers manually for everything. As it stands now, we are back on writing new code with mongoose.

the issue for a {where: {id: {in: array}}} is tracked here, i think: https://github.com/prisma/prisma/issues/21453

This here is about colscan that are caused by findUnique. Do you also observe those in your project?

I am serious though, a quick issue from you with a plain and obvious reproduction would go a long way. You clearly have a reproduction case now, and as a new issue we would also try to reproduce and then potentially prioritize that.

The initial description is still correct. findUnique(OrThrow) lead to a colscan due to wrongly implemented batching. I hope its clear how problematic this issue is in production 😔. (and many of the other reported and well documented mongodb performance problems.

I can offer to create a repo with some test cases though, maybe i can do that over the weekend.

@macrozone you could do a client extension that would convert findUnique to findFirst automatically.

@macrozone if you want to opt out of batching, you can use findFirst(OrThrow) instead of findUnique(OrThrow).

If you are using composite unique you will need to flatten where:

prisma.model.findUnique({
  where: {
     name_email: {
         name,
         email
     }
  }
})

becomes:

prisma.model.findUnique({
  where: {
      name,
      email
  }
})

For plain unique, just replacing findUnique with findFirst is enough.

@janpio is it possible to disable batching? This bug is very serious and we need to take immediate action 😢