typeorm: Performance problems when saving multiple documents

Problem

In our system we noticed slowdowns when we needed to insert a lot entities (up to 10.000). To see what the exact problem was we created a test script to test the insert speed of TypeORM.

For the test we used the save() method inside a transaction (see here). In the screenshot below we can see the results of the test we ran with different number of documents.

chart 1

We compare against raw queries using pg-promise. The expected behavior is that TypeORM is slower than pg-promise, but should still scale linear. What can be seen in the graph is that TypeORM exhibits O(n^2) runtime complexity instead. Inserting 16000 documents takes about 7.5 minutes using TypeORM.

The table below gives a more detailed view (times are in milliseconds), all the numbers are the average of 10 runs.

Documents PG-Batch TypeORM PG-Promise PG-InsertHelper
500 179 824 250 54
1000 268 1931 469 93
2000 476 12602 993 130
4000 863.1 58696 2015.7 216.5
8000 1618.9 116990.8 3886.1 385.9
16000 3281.2 438171.8 7706.2 736.8

What can be seen in the table is that when inserting a smaller amount of documents the performance is fine.

After every test run the NodeJS process was restarted. If we do all the test runs in the same process (loop) the test results are slightly better on average but the O(n^2) runtime complexity is still there. See the google doc sheet Machine 2 (continuous) for the results.

Solution

Currently not known.

Environment

TypeORM: 0.1.1 Database: Postgres 9.6 Node: 6.11 OS: Ubuntu

Entity used:

@Entity()
export class Document {
	@PrimaryColumn('text')
	id: string;

	@Column('text')
	docId: string;

	@Column('text')
	label: string;

	@Column('text')
	context: string;

	@Column({type: 'jsonb'})
	distributions: Distribution[];

	@Column({type: 'timestamp with time zone'})
	date: Date;
}

export interface Distribution {
	weight: string,
	id: number,
	docId: number
}

More information

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 17
  • Comments: 22 (13 by maintainers)

Most upvoted comments

Sharing this here as it might maybe help someone who still gets low execution times; Our solution was; Manually chunk our entities using the chunk method from lodash instead of the inbuild typeorm chunk option. (250 items per chunk delivered the fastest execution time for our use case) After this we saved them in a loop using the typeorm’s .insert() method instead of .save() as it was faster. Basic structure example;

const customers = await getRepository(Customer).find({});
const cosaArray = customers.map((customer) => CustomerOwnSupplierAccess.Create(supplier, customer));
const chunked = chunk(cosaArray, 250);

await Promise.all(
  chunked.map(async (chunkedCosaArray) => {
    await getRepository(CustomerOwnSupplierAccess).insert(chunkedCosaArray);
  }),
);

Added this table to show the execution times we got each time with each different approach.

Type of approach Execution time
.save() without any chunks inside a transaction ~5 minutes
.save() without any chunks without transaction ~3.5 minutes
.save() with typeorm chunk option (we did try all kind of amounts here; min 50 to max 1000) without transaction ~2.5 minutes
.insert() with typeorm chunk option (we did try all kind of amounts here; min 50 to max 1000) without transaction ~1.5 minutes
.save() with lodash chunk of entities (250 items) to save without trsansaction ~35 seconds
.insert() with lodash chunk of entities (250 items) before we save without transaction ~25 seconds

no you can’t. Use alpha version there is nothing wrong with it.

I made even more fixes in 0.2.0-alpha.5 and here are results for 10.000 documents insertion:

### Run 1
[PG-Batch] Call to persist took 4196 milliseconds.
[TypeormInsert] Call to persist took 728 milliseconds.
[TypeormSave] Call to persist took 12290 milliseconds.
[PG-InsertHelper] Call to persist took 1796 milliseconds.
Finished test.

@MaikelH I was referring to the PG-Batch example, the one that uses db directly and then uses Promise.all, not the one with the t.batch 😃 When you do it like that, you force all available connections out of the pool, so none will be available to do anything in parallel, until the operation is over.

The approach with the task/tx will use only one physical connection from the pool. See Chaining Queries.

From that article:

If you do not follow the advised approach, your application will perform better under a small load, due to more connections allocated in parallel, but under a heavy load it will quickly deplete the connection pool, crippling performance and scalability of your application.