azure-sdk-for-js: [Cosmos DB] Memory leak

Describe the bug

A memory leak was introduced between versions 3.9.2 and 3.14.1. We know this because we started noticing this pattern on the memory consumption of our web apps, after just bumping the library. We deployed the update on November 20.

Screenshot 2021-12-09 153337

To Reproduce

Run the following snippet twice: first with version = '3.9.2' and then with version = '3.14.1'. It will run the same query 100 times and take heap snapshots before and after:

const version = '3.9.2';
// const version = '3.14.1';

const { CosmosClient } = require(`./cosmos-${version}/node_modules/@azure/cosmos`);
const heapdump = require('heapdump');

async function doit() {
  const client = new CosmosClient({
    endpoint: 'ENDPOINT',
    key: 'KEY'
  });

  const container = client.database('people')
    .container('people');

  const res = await container.items.readAll().fetchAll();
  return res.resources.map(p => p.id);
}

async function main() {
  for (let i = 0; i < 5; i++) {
    await doit();
  }

  heapdump.writeSnapshot(`dump-${version}-1.heapsnapshot`);

  for (let i = 0; i < 100; i++) {
    await doit();
  }

  heapdump.writeSnapshot(`dump-${version}-2.heapsnapshot`);
}

main()

Here are the results: heaps.zip

3.9.2 - Nothing extraordinary

Screenshot 2021-12-09 153426

3.14.1 - A lot of x100 allocations, indicating a leak

Screenshot 2021-12-09 153447

Expected behavior

No leaks.

Additional context

We did further analysis and found the culprit commit: https://github.com/Azure/azure-sdk-for-js/commit/d3647f86a9f8e29f6376ef1e04015dd5e94d63a7, which introduced the issue in 3.12.0. This commit quietly changes the documentation on connectionPolicy.enableEndpointDiscovery to say:

Flag to enable/disable automatic redirecting of requests based on read/write operations. Default true. Required to call client.dispose() when this is set to true after destroying the CosmosClient inside another process or in the browser.

This points to a workaround we can use today: to call client.dispose() on clients we no longer need. I verified that it does indeed avoid the leak. We will do that immediately on our service.

But the fact remains:

  • This option should not be the default as it stands since it causes a severe leak;
  • Or, alternatively, this dramatic change in behavior would warrant a major version bump and documentation highlights to remember user to call client.dispose().

Note that today there is no reference to the client.dispose() method on the npm page or the Getting Started guide.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 8
  • Comments: 16 (8 by maintainers)

Most upvoted comments

@sajeetharan what is it related to? Can you forward this issue to the right location? This is a high quality report that identifies a key problem and we should not lose track of it.

I think there should at least be better highlighting of this issue in the samples/README. Eventually this might be a good use case for this proposed ECMAScript feature: https://github.com/tc39/proposal-explicit-resource-management

Here’s the outcome of applying the workaround: MicrosoftTeams-image

Oh hi @xirzec! I didn’t even click the profile to see who this was! Long time no see!