node-postgres: Curious behavior: `idleTimeoutMillis` works locally, but not on "serverless" environments

I was skeptical to write this issue since it might involve other things, not just pg, but I will give it a try:

So, everything works as expected on localhost: after I release a client to the Pool with client.release() and idleTimeoutMillis reaches its limit, my Pool correctly emits remove event (signaling that it correctly destroyed the client and also the real connection to the Database). Also, I can use my localhost to connect to the same Database I’m using in Production (RDS Postgres) and it works as expected.

But on environments like Vercel (which uses AWS Lambas under the hood), for some very curious reason, idleTimeoutMillis seems to not work as expected. Looks like after releasing a Client, it stays in the Pool as idle forever, not destroying the client/connection after reaching idleTimeoutMillis and leaving a lot of idle connections hanging on RDS.

To make things more strange, if I force a client to close using client.release(true), the remove event from the Pool is emitted and the connection is closed as expected… but forcing this to every client ruins the purpose of using a Pool in the first place.

I don’t know if there’s some different behavior of the eventloop in this kind of environment, and then the internal pg timers don’t get run or something like this.

import { Pool } from 'pg';
import retry from 'async-retry';
// ...

const configurations = {
  // ...
  connectionTimeoutMillis: 5000,
  idleTimeoutMillis: 5000,
  max: 1,
  allowExitOnIdle: true // I've also tried with false
};

const pool = new Pool(configurations);

async function query(query, params) {
  let client;

  try {
    client = await tryToGetNewClientFromPool();
    return await client.query(query, params);
  } catch (error) {
    throw new ServiceError({
      // ...
    });
  } finally {
    if (client) {
      client.release();
    }
  }
}

// If all Database connections are maxed out, this function will
// keep retrying until it gets one Client from Pool and returns
// the Promise with it.
async function tryToGetNewClientFromPool() {
  const clientFromPool = await retry(newClientFromPool, {
    retries: 15,
    minTimeout: 0,
    factor: 2,
  });

  return clientFromPool;

  async function newClientFromPool() {
    return await pool.connect();
  }
}

// ...

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 21 (7 by maintainers)

Most upvoted comments

Final solution (already working in production) is to use Pool as much as you can, but keep checking “opened connections” versus “available connections” and if they’re too close, begins to end() the Pool on finally instead of just release() th client.

My repository is still private, but here’s the full solution:

// database.js - exports a Singleton with `query()` and `getNewClient()` methods

import { Pool, Client } from 'pg';
import retry from 'async-retry';
import { ServiceError } from 'errors/index.js';

const configurations = {
  user: process.env.POSTGRES_USER,
  host: process.env.POSTGRES_HOST,
  database: process.env.POSTGRES_DB,
  password: process.env.POSTGRES_PASSWORD,
  port: process.env.POSTGRES_PORT,
  connectionTimeoutMillis: 5000,
  idleTimeoutMillis: 30000,
  max: 1,
  ssl: {
    rejectUnauthorized: false,
  },
  allowExitOnIdle: true,
};

// https://github.com/filipedeschamps/tabnews.com.br/issues/84
if (['test', 'development'].includes(process.env.NODE_ENV) || process.env.CI) {
  delete configurations.ssl;
}

const cache = {
  pool: null,
  maxConnections: null,
  reservedConnections: null,
  openedConnections: null,
  openedConnectionsLastUpdate: null,
};

async function query(query, params) {
  let client;

  try {
    client = await tryToGetNewClientFromPool();
    return await client.query(query, params);
  } catch (error) {
    const errorObject = new ServiceError({
      message: error.message,
      context: {
        query: query.text,
      },
      errorUniqueCode: 'INFRA:DATABASE:QUERY',
      stack: new Error().stack,
    });
    console.error(errorObject);
    throw errorObject;
  } finally {
    if (client) {
      const tooManyConnections = await checkForTooManyConnections(client);

      if (tooManyConnections) {
        client.release();
        await cache.pool.end();
        cache.pool = null;
      } else {
        client.release();
      }
    }
  }
}

async function tryToGetNewClientFromPool() {
  const clientFromPool = await retry(newClientFromPool, {
    retries: 50,
    minTimeout: 0,
    factor: 2,
  });

  return clientFromPool;

  async function newClientFromPool() {
    if (!cache.pool) {
      cache.pool = new Pool(configurations);
    }

    return await cache.pool.connect();
  }
}

async function checkForTooManyConnections(client) {
  const currentTime = new Date().getTime();
  const openedConnectionsMaxAge = 10000;
  const maxConnectionsTolerance = 0.9;

  if (cache.maxConnections === null || cache.reservedConnections === null) {
    const [maxConnections, reservedConnections] = await getConnectionLimits();
    cache.maxConnections = maxConnections;
    cache.reservedConnections = reservedConnections;
  }

  if (
    !cache.openedConnections === null ||
    !cache.openedConnectionsLastUpdate === null ||
    currentTime - cache.openedConnectionsLastUpdate > openedConnectionsMaxAge
  ) {
    const openedConnections = await getOpenedConnections();
    cache.openedConnections = openedConnections;
    cache.openedConnectionsLastUpdate = currentTime;
  }

  if (cache.openedConnections > (cache.maxConnections - cache.reservedConnections) * maxConnectionsTolerance) {
    return true;
  }

  return false;

  async function getConnectionLimits() {
    const [maxConnectionsResult, reservedConnectionResult] = await client.query(
      'SHOW max_connections; SHOW superuser_reserved_connections;'
    );
    return [
      maxConnectionsResult.rows[0].max_connections,
      reservedConnectionResult.rows[0].superuser_reserved_connections,
    ];
  }

  async function getOpenedConnections() {
    const openConnectionsResult = await client.query(
      'SELECT numbackends as opened_connections FROM pg_stat_database where datname = $1',
      [process.env.POSTGRES_DB]
    );
    return openConnectionsResult.rows[0].opened_connections;
  }
}

async function getNewClient() {
  try {
    const client = await tryToGetNewClient();
    return client;
  } catch (error) {
    const errorObject = new ServiceError({
      message: error.message,
      errorUniqueCode: 'INFRA:DATABASE:GET_NEW_CONNECTED_CLIENT',
      stack: new Error().stack,
    });
    console.error(errorObject);
    throw errorObject;
  }
}

async function tryToGetNewClient() {
  const client = await retry(newClient, {
    retries: 50,
    minTimeout: 0,
    factor: 2,
  });

  return client;

  // You need to close the client when you are done with it
  // using the client.end() method.
  async function newClient() {
    const client = new Client(configurations);
    await client.connect();
    return client;
  }
}

export default Object.freeze({
  query,
  getNewClient,
});

Final results

First strategy

Open and close connection for every single query:

Time taken for tests:   78.256 seconds

Last strategy

Manage Pool based on opened vs available connections:

Time taken for tests:   20.968 seconds

@filipedeschamps Thank you so much for investigating this issue. had the same problem and am quite happy with the following solution:

const MAX_SIGNED_32_BIT_INTEGER = 2147483647;
...
{
     idleTimeoutMillis: MAX_SIGNED_32_BIT_INTEGER,
     reapIntervalMillis: MAX_SIGNED_32_BIT_INTEGER,
}
...

This configuration effectively deactivates the reap cycle in tarn.js, meaning that idle connections are never cleaned up, regardless of whether the lambda process is hot or cold. Given the operational characteristics of lambdas, server-side cleanup does not work reliably, so disabling it seems sensible.

Additionally, I set the idle_session_timeout in PostgreSQL to 30,000 milliseconds (30 seconds) to cleanup idle connections on the server side (PostgreSQL Documentation).

I also implemented graceful shutdown listeners in the lambda to ensure all remaining connections are cleaned up before the process is terminated, although this step seems optional since the server would clean these up after 30 seconds anyway. More details can be found in this Stack Overflow discussion.

A good option is also to look into RDS Proxy to achieve an external db connection pool.

Btw, there is no concurrent client sharing in this case (and the poolUsageCount would always be 0 at the end of @sehrope’s handler), because the AWS Lambda will only send a single request at a time to each instance.

I didn’t and it’s a great point, but are you sure a “waiting to open” connection will throw this message? I thought it was due to the fact a request called client.end(), flagging the client to _closing = true and then a subsequent request found this still existing object and tried to use it:

My mistake, the connection timeout sets the client.connection._ending flag, not the client._ending: https://github.com/brianc/node-postgres/blob/21ccd4f1b6e66774bbf24aecfccdbfe7c9b49238/packages/pg/lib/client.js#L103-L107 In this case, I don’t think connection timeout would result in that exact error message. If you don’t use pool, that error should only be triggered by using a client after calling client.end(). The connection timeout might result in Connection terminated unexpectedly errors.