drizzle-orm: [BUG]: drizzle-orm/planetscale-serverless reuses connection, leading to inconsistent issues

What version of drizzle-orm are you using?

0.29…3

What version of drizzle-kit are you using?

0.20.9

Describe the Bug

When using planetscale with drizzle, the planetscale connection is reused for subsequent queries to the db. This is an incorrect usage of the planetscale/database driver and can lead to issues with queries/returns in parallel transactions/queries. see: https://github.com/planetscale/database-js/issues/139 & https://github.com/mattrobenolt/ps-http-sim/issues/1

The issue was highlighted here: https://github.com/mattrobenolt/ps-http-sim/issues/7#issuecomment-1864636196

In my local testing: when using the planetscale/database driver directly, for each query a new connections+session is created when using via drizzle-orm/planetscale-serverless, a single connection is created and reused for each subsequent query until the connection is closed by the PS server

how this was discovered: A planetscale simulator was created by @mattrobenolt who also works at PS. the logs and errors returned by the simulator highlighted this discussion here: https://github.com/mattrobenolt/ps-http-sim/issues/7

how to test/replicate: use the planetscale simulator docker compose example and monitor logs: https://github.com/mattrobenolt/ps-http-sim run multiple drizzle queries in quick succession check docker logs for how many sessionIds were created

Run the same queries directly using @planetscale/database recheck docker logs and count sessionIds

example log

planetscale-simulator-proxy  | 13m32.708207828s |INFO| ok caller="ps-http-sim/main.go:252" method="Execute" 
.........
session="lU3bPM37IYbc_7IR1p_nz" client_session=true

Expected behavior

a new PS connection to be created for each db query as per @planetscale/database default

Environment & setup

No response

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Reactions: 1
  • Comments: 18 (11 by maintainers)

Commits related to this issue

Most upvoted comments

This is a lot of text, so bear with me.

So after a little bit of investigation, I think the Drizzle docs set us up for failure here and encourage misuse unintentionally.

Their example is this:

import { drizzle } from "drizzle-orm/planetscale-serverless";
import { connect } from "@planetscale/database";

// create the connection
const connection = connect({
  host: process.env["DATABASE_HOST"],
  username: process.env["DATABASE_USERNAME"],
  password: process.env["DATABASE_PASSWORD"],
});

const db = drizzle(connection);

As seen on: https://orm.drizzle.team/docs/get-started-mysql

This right here is encouraging one global connection for the lifetime of your application, which then encourages no conscious behavior or when to create a new connection vs when to reuse a connection.

I’m not a JavaScript developer myself and don’t fully understanding all the contexts here, but having one global connection is going to be problematic for many reasons. If this is in a long lived application and this shared connection is used within say, some HTTP request handler, you’re inherently sharing state between discrete requests and whether intentionally or not, you’re going to be leaking state between different requests if the requests are happening in parallel, even if within a single request, you’re using it sequentially. The state is going to leak across different connections.

Take an example (ignore my ignorance in JavaScript patterns here:

// global connection
const db = drizzle(connection)

function handleHTTPRequest(req) {
  // pretend this is using drizzle, but same idea
  const user = await db.execute('select * from user where id=1')
  return JSON.stringify(user)
}

This is very naive and innocent, but we can assume in a long running applicaiton, handleHTTPRequest might be running on multiple requests at the same time concurrently. So our code inside the handler is not, but the runtime is invoking it in parallel.

In this example, this is technically fine. But let’s make it more complicated:

const db = drizzle(connection)

function handleHTTPRequest(req) {
  // again, I don't know how drizzle works, but I assume it can do transactions somehow
  const results = await db.transaction(async (tx) => {
    await tx.execute('INSERT INTO foo (a, b) VALUES (?, ?)')
    await tx.execute('INSERT INTO bar (c, d) VALUES (?, ?)')
    return [1, 2]
  })
  return JSON.stringify(results)
})

So here’s where things get interesting. This request handler translates into 4 sequential queries. Without this single handler, we are doing the correct thing, but the fact that this is sharing a global connection is where we have an issue. So if we think of what this is doing, this translates into these queries:

BEGIN;
INSERT INTO foo (a, b) VALUES (?, ?);
INSERT INTO bar (c, d) VALUES (?, ?);
COMMIT;

The BEGIN and COMMIT are implied as a part of our db.transaction() but you could have done this manually all on await db.execute("BEGIN"); await db.execute("COMMIT"), etc.

Now, these 4 queries require to be done over the same connection, and the order matters because the transaction state is maintained on the connection.

So if this is happening in parallel, what happens if queries start getting interleaved?

request 1: BEGIN;
request 1: INSERT INTO foo (a, b) VALUES (?, ?);
request 2: BEGIN;
request 1: INSERT INTO bar (c, d) VALUES (?, ?);
request 2: INSERT INTO foo (a, b) VALUES (?, ?);
request 1: COMMIT;
request 2: INSERT INTO bar (c, d) VALUES (?, ?);
request 2: COMMIT;

Now this gets quite bad. We can kinda extrapolate into how this can be even worse since we’re crossing transactiosn wiht different data, COMMITs and ROLLBACK’s happening over the same transaction implicitly between discrete requests.

So personally, I think uses of Drizzle need to adopt a our Client model.

So something more like this:

import { drizzle } from "drizzle-orm/planetscale-serverless";
import { Client } from "@planetscale/database";

// create the Client
const client = new Client({
  host: process.env["DATABASE_HOST"],
  username: process.env["DATABASE_USERNAME"],
  password: process.env["DATABASE_PASSWORD"],
});

function db() {
  return drizzle(client.connection());
}

So key distinction here, client is fine to be global and shared. Client doesn’t store any state. Whenever you want a new connection, using db() gets a fresh one. In our example, we can translate that to something like:

function handleHTTPRequest(req) {
  // get a new Connection for this entire request
  const conn = db();
  // now we can safely do anything sequentially over this single `conn` in here.
}

This alone solves a cross-request contamination and state being intermingled between multiple discrete requests.

Similarly, this helps provide the building block to doing parallelism correctly as well.

await Promise.all([
  db().execute("select 1"),
  db().execute("select 2"),
])

now we are correctly executing these queries in parallel because each one gets a unique Connection and their state isn’t clobbering each other.

It’s probably worth noting that it’s the developer’s responsibility. to know when to create a unique Connection and when it can be reused.

If you’re intentionally sharing state, the most common case is transactions, you need to reuse the same connection. If you’re setting variables with SET, switching database with USE, etc, these all require the state to be persisted on the Connection or it’s lost. For example:

db().execute("begin");
db().execute("insert ...");
db().execute("commit");

Will not do anything useful since the INSERT won’t be done within the transaction.

const conn = db();
conn.execute("begin");
conn.execute("insert ...");
conn.execute("commit");

is now correct.

Similarly it’s not possible to do concurrency within a transaction, such as:

const conn = db();
await conn.execute("begin");
await Promise.all([
  conn.execute("insert ..."),
  conn.execute("insert ..."),
]);
await conn.execute("commit");

This fundamentally does not work as you want and breaks a lot of guarantees around transactional safety. Operations within a transaction are required to be executed sequentially against the database. They ultimately are, but the order isn’t predictable and it’s potential for undefined behaviors.

I hope this helps, but in summary, I think we need to encourage more of a “connection factory” pattern, rather than a single global shared connection, and this would help mitigate a lot of misuse. So using a db() type function, rather than a const db that is reused over and over. A new connection in our driver is essentially free, so there should be no concern with doing this egregiously. A Connection is just an object with a unique internal State object. It does not translate to initiating a new TCP connection or anything like that.