dynamoose: [BUG] Slow scan operations

Summary:

I’m using dynamoose for my project, and overall it works well. However scan operations are several times slower than the same operations using AWS DocumentClient.

Code sample:

Schema

It’s the same with all my different schemas. Here is an example of the simplest one. None of them use Buffer type.

const userSchema = new dynamoose.Schema(
  {
    id: {
      type: String,
      required: true,
      hashKey: true,
    },
    phoneNumber: {
      type: String,
      index: {
        project: true,
        global: true,
        name: 'UserPhoneNumberIndex',
      },
    },
    name: String,
    status: String,
    email: String,
    // a few more fields with strings, numbers and booleans
  },
  { timestamps: true }
)

Model

const model = dynamoose.model('myTable', mySchema, {
    create: false,
    waitForActive: false,
  })

General

// dynamoose
MyModel.scan().all().exec()

// DocumentClient
export const scanTable = async <T = unknown>(
  table: string,
  extraParams?: Partial<AWS.DynamoDB.DocumentClient.ScanInput>
): Promise<T[]> => {
  const params: AWS.DynamoDB.DocumentClient.ScanInput = {
    TableName: `${constants.ENV}-${table}`,
    ...extraParams,
  }

  const scanResults = []
  let response: PromiseResult<
    AWS.DynamoDB.DocumentClient.ScanOutput,
    AWS.AWSError
  >
  do {
    // eslint-disable-next-line no-await-in-loop
    response = await getDocumentClient().scan(params).promise()
    scanResults.push(...response.Items)
    params.ExclusiveStartKey = response.LastEvaluatedKey
  } while (typeof response.LastEvaluatedKey !== 'undefined')

  return <T[]>scanResults
}

Current output and behavior (including stack trace):

Example 1: Scanning ~450 items that are ~1 kb each (according to aws dynamodb console). Using aws DocumentClient and scanning all: ~400 ms Using dynamoose scan all: ~4500 ms

Example 2: Scanning ~23000 items that are ~230 bytes each. Using aws DocumentClient and scanning all: ~6000 ms Using dynamoose scan all: ~24000 ms

Expected output and behavior:

Somewhat similar performance

Environment:

Operating System: Amazon Linux Operating System Version: 2 Node.js version (node -v): v14.x NPM version: (npm -v): 7.18.1 Dynamoose version: 2.8.1

Other information (if applicable):

AWS lambda using the Serverless framework.

Serverless version: 2.48.0 aws sdk version: 2.952.0 AWS_NODEJS_CONNECTION_REUSE_ENABLED: 1

Other:

  • [ x ] I have read through the Dynamoose documentation before posting this issue
  • [ x ] I have searched through the GitHub issues (including closed issues) and pull requests to ensure this issue has not already been raised before
  • [ x ] I have searched the internet and Stack Overflow to ensure this issue hasn’t been raised or answered before
  • [ x ] I have tested the code provided and am confident it doesn’t work as intended
  • [ x ] I have filled out all fields above
  • [ x ] I am running the latest version of Dynamoose

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 7
  • Comments: 16 (6 by maintainers)

Most upvoted comments

@fishcharlie I can confirm that deep_copy is taking a significant performance toll when querying data sets and objects. I have provided a repository that provides a small reproducible example using DynamoDB Local:

https://github.com/tranhl/dynamooose-deep-copy-repro

As @PaulAtST has identified in this comment, deep_copy is the primary culprit, due to the objectUtils.isCircular call. While deep_copy is coping the source object, it calls isCircular to check whether encountered objects contain circular references. This is essentially an O(n^2) operation, as the isCircular function will also traverse the object in order to determine whether circular references exist.

Instead of relying on isCircular to detect and omit circular references, we should instead handle this ourselves within deep_copy, so that we only traverse the input object once. The provided reproduction repository contains a patched version of Dynamoose that does this, with the profiling results speaking for themselves.

We were experiencing the same performance issues with a large query request. After upgrading to v3 alpha, performance is now on-par with the native client.

Hi,

I’ve got the same problem using Dynsamoose on AWS Lambda, on localhost ubuntu works perfectly.

After some investigation I have found what cause the issue: this line https://github.com/dynamoose/dynamoose/blob/a838224d031ba32db6f84d427600beda5ec765ed/lib/DocumentRetriever.ts#L59

Processing 150 records took 10s (sic!)