typeorm: Performance problems when saving multiple documents

Problem

In our system we noticed slowdowns when we needed to insert a lot entities (up to 10.000). To see what the exact problem was we created a test script to test the insert speed of TypeORM.

For the test we used the save() method inside a transaction (see here). In the screenshot below we can see the results of the test we ran with different number of documents.

chart 1

We compare against raw queries using pg-promise. The expected behavior is that TypeORM is slower than pg-promise, but should still scale linear. What can be seen in the graph is that TypeORM exhibits O(n^2) runtime complexity instead. Inserting 16000 documents takes about 7.5 minutes using TypeORM.

The table below gives a more detailed view (times are in milliseconds), all the numbers are the average of 10 runs.

Documents	PG-Batch	TypeORM	PG-Promise	PG-InsertHelper
500	179	824	250	54
1000	268	1931	469	93
2000	476	12602	993	130
4000	863.1	58696	2015.7	216.5
8000	1618.9	116990.8	3886.1	385.9
16000	3281.2	438171.8	7706.2	736.8

What can be seen in the table is that when inserting a smaller amount of documents the performance is fine.

After every test run the NodeJS process was restarted. If we do all the test runs in the same process (loop) the test results are slightly better on average but the O(n^2) runtime complexity is still there. See the google doc sheet Machine 2 (continuous) for the results.

Solution

Currently not known.

Environment

TypeORM: 0.1.1 Database: Postgres 9.6 Node: 6.11 OS: Ubuntu

Entity used:

@Entity()
export class Document {
	@PrimaryColumn('text')
	id: string;

	@Column('text')
	docId: string;

	@Column('text')
	label: string;

	@Column('text')
	context: string;

	@Column({type: 'jsonb'})
	distributions: Distribution[];

	@Column({type: 'timestamp with time zone'})
	date: Date;
}

export interface Distribution {
	weight: string,
	id: number,
	docId: number
}

More information

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 17
Comments: 22 (13 by maintainers)

Most upvoted comments

Sharing this here as it might maybe help someone who still gets low execution times; Our solution was; Manually chunk our entities using the chunk method from lodash instead of the inbuild typeorm chunk option. (250 items per chunk delivered the fastest execution time for our use case) After this we saved them in a loop using the typeorm’s .insert() method instead of .save() as it was faster. Basic structure example;

const customers = await getRepository(Customer).find({});
const cosaArray = customers.map((customer) => CustomerOwnSupplierAccess.Create(supplier, customer));
const chunked = chunk(cosaArray, 250);

await Promise.all(
  chunked.map(async (chunkedCosaArray) => {
    await getRepository(CustomerOwnSupplierAccess).insert(chunkedCosaArray);
  }),
);

Added this table to show the execution times we got each time with each different approach.

Type of approach	Execution time
.save() without any chunks inside a transaction	~5 minutes
.save() without any chunks without transaction	~3.5 minutes
.save() with typeorm chunk option (we did try all kind of amounts here; min 50 to max 1000) without transaction	~2.5 minutes
.insert() with typeorm chunk option (we did try all kind of amounts here; min 50 to max 1000) without transaction	~1.5 minutes
.save() with lodash chunk of entities (250 items) to save without trsansaction	~35 seconds
.insert() with lodash chunk of entities (250 items) before we save without transaction	~25 seconds

LeventHAN on Nov 4, 2022

no you can’t. Use alpha version there is nothing wrong with it.

pleerock on Dec 1, 2017

I made even more fixes in 0.2.0-alpha.5 and here are results for 10.000 documents insertion:

### Run 1
[PG-Batch] Call to persist took 4196 milliseconds.
[TypeormInsert] Call to persist took 728 milliseconds.
[TypeormSave] Call to persist took 12290 milliseconds.
[PG-InsertHelper] Call to persist took 1796 milliseconds.
Finished test.

pleerock on Nov 17, 2017

@MaikelH I was referring to the PG-Batch example, the one that uses db directly and then uses Promise.all, not the one with the t.batch 😃 When you do it like that, you force all available connections out of the pool, so none will be available to do anything in parallel, until the operation is over.

The approach with the task/tx will use only one physical connection from the pool. See Chaining Queries.

From that article:

If you do not follow the advised approach, your application will perform better under a small load, due to more connections allocated in parallel, but under a heavy load it will quickly deplete the connection pool, crippling performance and scalability of your application.

vitaly-t on Nov 7, 2017