badger: deadlock when writing into Badger using multiple routines
go version go1.13 linux/amd64 github.com/dgraph-io/badger v2.0.0-rc3+incompatible I have to admit I didn’t try latest master, but I’ve seen this issue since early versions of Badger 2.0 and it’s still there with rc3. The server is running on 56-core Intel Xeon E5-2660, no idea about the disk.
I’m reading several multi-gigabyte files in their own binary format and “converting” them into Badger, writing data encoded with Protobuf. The initial data is organized in batches (of arbitrary size, from 1 to 1000+ records in a batch), so I’m using WriteBatches to write the actual data and then badger.Update() to update the counters (stored in the same badger db). To speed up the process I’m using 64 goroutines that listen on channel with data and do the actual encoding/writing.
The problem is that after some time the badger deadlocks and stops writing anything, the process is stuck with the backtraces provided below. As you can see from the backtrace 3, I tried to use Txn.Set/Commit directly instead of using badger.Update, but it didn’t help. Unfortunately, I’m unable to provide a reproduce code so far, as the issue seems to be reproducible only on really large amounts of data after a couple of hours of running with no error messages whatsoever. To make it more complicated, I’m unable to reproduce it in 100% of cases even using the same initial data and the same Go code.
My code looks very much like this (in simplified go “pseudocode”):
//this is the function that's called from 64 worker routines
func LoadData(records []*Record) error {
wb := badger.NewWriteBatch()
defer wb.Cancel()
for ... {
err = wb.Set(key, value)
if err != nil {
return err
}
}
err = wb.Flush()
if err == nil {
updateStats(..)
}
return err
}
func updateStats(...) error {
statsLock.Lock()
defer statsLock.Unlock()
stats += value
//...serialize value into val
txn := badger.NewTransaction(true)
defer txn.Discard()
err = txn.Set(key, val)
if err != nil {
return err
}
err = txn.Commit()
if err != nil {
return err
}
return err
}
Backtraces: 1.
6 @ 0x43d600 0x44dcc0 0x44dcab 0x44d912 0x4762e4 0x976709 0x474c83 0x975cb2 0x975c5e 0x985549 0xa16b0b 0xa800ac 0x46b381
# 0x44d911 sync.runtime_Semacquire+0x41 /home/tony/go/src/runtime/sema.go:56
# 0x4762e3 sync.(*WaitGroup).Wait+0x63 /home/tony/go/src/sync/waitgroup.go:130
# 0x976708 github.com/dgraph-io/badger/y.(*Throttle).Finish.func1+0x38 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/y.go:292
# 0x474c82 sync.(*Once).doSlow+0xe2 /home/tony/go/src/sync/once.go:66
# 0x975cb1 sync.(*Once).Do+0x71 /home/tony/go/src/sync/once.go:57
# 0x975c5d github.com/dgraph-io/badger/y.(*Throttle).Finish+0x1d /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/y.go:291
# 0x985548 github.com/dgraph-io/badger.(*WriteBatch).Flush+0x78 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/batch.go:159
# 0xa16b0a go.badoo.dev/meetlist/db.(*DB).AddRomancesFromSnapshot+0x88a /local/eye/git/meetlist/db/db.go:691
# 0xa800ab main.snapshotWorker+0x24b /local/eye/git/meetlist/main.go:617
6 @ 0x43d600 0x44dcc0 0x44dcab 0x44d912 0x4762e4 0x9b5ff1 0x9bfbc0 0x9ae911 0x46b381
# 0x44d911 sync.runtime_Semacquire+0x41 /home/tony/go/src/runtime/sema.go:56
# 0x4762e3 sync.(*WaitGroup).Wait+0x63 /home/tony/go/src/sync/waitgroup.go:130
# 0x9b5ff0 github.com/dgraph-io/badger.(*request).Wait+0x30 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:938
# 0x9bfbbf github.com/dgraph-io/badger.(*Txn).commitAndSend.func1+0x3f /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:501
# 0x9ae910 github.com/dgraph-io/badger.runTxnCallback+0x50 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:572
1 @ 0x43d600 0x44d15b 0x97485c 0x9ac2d0 0x9aedc7 0xa13f8c 0xa13f72 0xa14406 0xa16b74 0xa800ac 0x46b381
# 0x97485b github.com/dgraph-io/badger/y.(*WaterMark).WaitForMark+0x13b /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/watermark.go:124
# 0x9ac2cf github.com/dgraph-io/badger.(*oracle).readTs+0xdf /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:119
# 0x9aedc6 github.com/dgraph-io/badger.(*DB).newTransaction+0xc6 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:666
# 0xa13f8b github.com/dgraph-io/badger.(*DB).NewTransaction+0x10b /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:634
# 0xa13f71 go.badoo.dev/meetlist/db.(*DB).writeStatsEntry+0xf1 /local/eye/git/meetlist/db/db.go:154
# 0xa14405 go.badoo.dev/meetlist/db.(*DB).updateStats+0xb5 /local/eye/git/meetlist/db/db.go:181
# 0xa16b73 go.badoo.dev/meetlist/db.(*DB).AddRomancesFromSnapshot+0x8f3 /local/eye/git/meetlist/db/db.go:694
# 0xa800ab main.snapshotWorker+0x24b /local/eye/git/meetlist/main.go:617
There are some more routines with their own traces, dunno if they are related to the issue or not:
1 @ 0x43d600 0x44d15b 0x98b827 0x46b381
# 0x98b826 github.com/dgraph-io/badger.(*DB).doWrites+0x2d6 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:734
1 @ 0x43d600 0x44d15b 0x98d800 0x46b381
# 0x98d7ff github.com/dgraph-io/badger.(*DB).updateSize+0x15f /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:1021
1 @ 0x43d600 0x44d15b 0x9a5eff 0x46b381
# 0x9a5efe github.com/dgraph-io/badger.(*publisher).listenForUpdates+0x17e /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/publisher.go:67
1 @ 0x43d600 0x44dcc0 0x44dcab 0x44da27 0x47483c 0x9b8c6f 0x9b8aab 0x99c9ce 0x99e556 0x99ef97 0x99a1d9 0x46b381
# 0x44da26 sync.runtime_SemacquireMutex+0x46 /home/tony/go/src/runtime/sema.go:71
# 0x47483b sync.(*Mutex).lockSlow+0xfb /home/tony/go/src/sync/mutex.go:138
# 0x9b8c6e sync.(*Mutex).Lock+0x1fe /home/tony/go/src/sync/mutex.go:81
# 0x9b8aaa github.com/dgraph-io/badger.(*valueLog).updateDiscardStats+0x3a /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1385
# 0x99c9cd github.com/dgraph-io/badger.(*levelsController).compactBuildTables+0x208d /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:636
# 0x99e555 github.com/dgraph-io/badger.(*levelsController).runCompactDef+0xc5 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:791
# 0x99ef96 github.com/dgraph-io/badger.(*levelsController).doCompact+0x4b6 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:860
# 0x99a1d8 github.com/dgraph-io/badger.(*levelsController).runWorker+0x318 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:356
1 @ 0x43d600 0x44dcc0 0x44dcab 0x44da27 0x47483c 0x9b8f80 0x9b8cc0 0x9b8c02 0x99c9ce 0x99e556 0x99ef97 0x99a1d9 0x46b381
# 0x44da26 sync.runtime_SemacquireMutex+0x46 /home/tony/go/src/runtime/sema.go:71
# 0x47483b sync.(*Mutex).lockSlow+0xfb /home/tony/go/src/sync/mutex.go:138
# 0x9b8f7f sync.(*Mutex).Lock+0x2ef /home/tony/go/src/sync/mutex.go:81
# 0x9b8cbf github.com/dgraph-io/badger.(*valueLog).flushDiscardStats+0x2f /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1403
# 0x9b8c01 github.com/dgraph-io/badger.(*valueLog).updateDiscardStats+0x191 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1393
# 0x99c9cd github.com/dgraph-io/badger.(*levelsController).compactBuildTables+0x208d /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:636
# 0x99e555 github.com/dgraph-io/badger.(*levelsController).runCompactDef+0xc5 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:791
# 0x99ef96 github.com/dgraph-io/badger.(*levelsController).doCompact+0x4b6 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:860
# 0x99a1d8 github.com/dgraph-io/badger.(*levelsController).runWorker+0x318 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:356
1 @ 0x43d600 0x459807 0x4597dd 0x98af33 0x9bb65a 0x46b381
# 0x4597dc time.Sleep+0x12c /home/tony/go/src/runtime/time.go:105
# 0x98af32 github.com/dgraph-io/badger.(*DB).writeRequests+0x242 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:657
# 0x9bb659 github.com/dgraph-io/badger.(*DB).doWrites.func1+0x59 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:706
1 @ 0x43d600 0x459807 0x4597dd 0x99f7ab 0x98cf49 0x98d1bb 0x9bb1a7 0x46b381
# 0x4597dc time.Sleep+0x12c /home/tony/go/src/runtime/time.go:105
# 0x99f7aa github.com/dgraph-io/badger.(*levelsController).addLevel0Table+0x37a /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:907
# 0x98cf48 github.com/dgraph-io/badger.(*DB).handleFlushTask+0x7b8 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:925
# 0x98d1ba github.com/dgraph-io/badger.(*DB).flushMemtable+0x16a /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:941
# 0x9bb1a6 github.com/dgraph-io/badger.Open.func4+0x36 /home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:303
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 18 (8 by maintainers)
Commits related to this issue
- Fix deadlock in discard stats (#1070) Fixes #1032 Currently discardStats flow is as follows: * Discard Stats are generated during compaction. At the end, compaction routine updates these stats in... — committed to dgraph-io/badger by ashish-goswami 5 years ago
- Fix deadlock in discard stats (#1070) Fixes #1032 Currently discardStats flow is as follows: * Discard Stats are generated during compaction. At the end, compaction routine updates these stats in vl... — committed to dgraph-io/badger by ashish-goswami 5 years ago
@tony2001 we found the issue. we’re working on the fix. Will update you after it gets merged. 😃