sequelize: Transactions extremely slow when inserting 1000s of records using findOrCreate
Hello,
Great work on the library! It’s been a lifesaver so far and I’ve been able to figure everything out through the docs. I’m learning my way around transactions and it’s been a bit of a mind-opener. I recently hit a strange problem though and would appreciate any help.
I’m trying to load several thousand twitter accounts into the database simultaneously. I wrapped the operation in a transaction, and my application started to slow to a crawl. Below is the relevant code:
return sequelize.transaction(function (t) {
console.log('Started transaction for list' + count)
return sequelize.Promise.map(members, function (member) {
return Tweeter.findOrCreate({where: {id: member.id}, defaults: member}, {transaction: t})
// return Tweeter.upsert(member)
})
})
In my application, without the transaction, inserting the several thousand accounts only took ~30 seconds. However, once I wrapped it in the transaction it took up to 5-10 minutes. I prepared a test repo (instructions below) which only performs this operation. In this isolated environment, I found that without the transaction, the operation would finish in 8-9s, but with a transaction it would take 30-40s.
There are several things I’m confused about:
- Why would this operation take my application several minutes? For only several thousand rows, amounting to ~6MB of data, it seems ridiculous that it would take so long.
- Why the great discrepancy between the isolated test case and the application results? In the application, I’m saving 16 arrays totaling ~3000 accounts, with the last 2 being length ~1000 each, and the first 14 in total being length ~1000. These first 14 arrays save in ~35s, and then as I mentioned I’m stuck waiting 5-10m for the last 2. The test case saves just those 2 arrays of accounts and takes 30s.
- Is it normal for transactions around findOrCreate to take 3x longer?
- Something I noticed was that I sometimes hit the memory limit on my computer during these operations. Could this cause the transaction to fail? Could I be filling up some kind of buffer of operations? Since I have a transaction I can’t commit until I can commit all 1000 operations, right? So maybe the Postgres is attempting to hold all the locks from each operation in memory but then continuously failing until the system idles some other processes? Any ideas on how I might be able to test this?
For the test case repo, all you need to to run it is to create a file called config.json
and add your database URI:
{
"db": "postgres://pass@host:port/db"
}
By default, the code will attempt to do the transaction, but you’ll find that on line 35 of index.js you can change transaction()
to noTransaction()
which will test the other case.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 4
- Comments: 34 (26 by maintainers)
findOrCreate
uses a transaction internally, tryfindCreateFind
if you’re not using an outer transaction.So how to deal with bulk create with unique requirements in Postgres? inserting rows may have unique conflict with data in database. And postgres cannot use
updateOnDuplicate
andignoreDuplicates
withbulkCreate
. findCreateFind still needs 2 seconds for 100 rows with 3 column unique index. I’ll tryfindOrCreate
with transaction.You are not using transactions and inserting 100 records take 3s ? @frogcjn
For ETL stuff you might want to stay away from transactions, transactions are great for ensuring data integrity which might not be needed if you run an ETL (where you can start from scratch everytime possibly).