badger: deadlock when writing into Badger using multiple routines

go version go1.13 linux/amd64 github.com/dgraph-io/badger v2.0.0-rc3+incompatible I have to admit I didn’t try latest master, but I’ve seen this issue since early versions of Badger 2.0 and it’s still there with rc3. The server is running on 56-core Intel Xeon E5-2660, no idea about the disk.

I’m reading several multi-gigabyte files in their own binary format and “converting” them into Badger, writing data encoded with Protobuf. The initial data is organized in batches (of arbitrary size, from 1 to 1000+ records in a batch), so I’m using WriteBatches to write the actual data and then badger.Update() to update the counters (stored in the same badger db). To speed up the process I’m using 64 goroutines that listen on channel with data and do the actual encoding/writing.

The problem is that after some time the badger deadlocks and stops writing anything, the process is stuck with the backtraces provided below. As you can see from the backtrace 3, I tried to use Txn.Set/Commit directly instead of using badger.Update, but it didn’t help. Unfortunately, I’m unable to provide a reproduce code so far, as the issue seems to be reproducible only on really large amounts of data after a couple of hours of running with no error messages whatsoever. To make it more complicated, I’m unable to reproduce it in 100% of cases even using the same initial data and the same Go code.

My code looks very much like this (in simplified go “pseudocode”):

//this is the function that's called from 64 worker routines
func LoadData(records []*Record) error { 
    wb := badger.NewWriteBatch()
    defer wb.Cancel()
    for ... {
           err = wb.Set(key, value)
           if err != nil {
               return err
           }
    }
    err = wb.Flush()
    if err == nil {
        updateStats(..)
    }
    return err
}

func updateStats(...) error {
    statsLock.Lock()
    defer statsLock.Unlock()
    stats += value
    //...serialize value into val
    txn := badger.NewTransaction(true)
    defer txn.Discard()

    err = txn.Set(key, val)
    if err != nil {
        return err
    }

    err = txn.Commit()
    if err != nil {
        return err
    }
    return err
}

Backtraces: 1.

6 @ 0x43d600 0x44dcc0 0x44dcab 0x44d912 0x4762e4 0x976709 0x474c83 0x975cb2 0x975c5e 0x985549 0xa16b0b 0xa800ac 0x46b381
#	0x44d911	sync.runtime_Semacquire+0x41					/home/tony/go/src/runtime/sema.go:56
#	0x4762e3	sync.(*WaitGroup).Wait+0x63					/home/tony/go/src/sync/waitgroup.go:130
#	0x976708	github.com/dgraph-io/badger/y.(*Throttle).Finish.func1+0x38	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/y.go:292
#	0x474c82	sync.(*Once).doSlow+0xe2					/home/tony/go/src/sync/once.go:66
#	0x975cb1	sync.(*Once).Do+0x71						/home/tony/go/src/sync/once.go:57
#	0x975c5d	github.com/dgraph-io/badger/y.(*Throttle).Finish+0x1d		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/y.go:291
#	0x985548	github.com/dgraph-io/badger.(*WriteBatch).Flush+0x78		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/batch.go:159
#	0xa16b0a	go.badoo.dev/meetlist/db.(*DB).AddRomancesFromSnapshot+0x88a	/local/eye/git/meetlist/db/db.go:691
#	0xa800ab	main.snapshotWorker+0x24b					/local/eye/git/meetlist/main.go:617
6 @ 0x43d600 0x44dcc0 0x44dcab 0x44d912 0x4762e4 0x9b5ff1 0x9bfbc0 0x9ae911 0x46b381
#	0x44d911	sync.runtime_Semacquire+0x41					/home/tony/go/src/runtime/sema.go:56
#	0x4762e3	sync.(*WaitGroup).Wait+0x63					/home/tony/go/src/sync/waitgroup.go:130
#	0x9b5ff0	github.com/dgraph-io/badger.(*request).Wait+0x30		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:938
#	0x9bfbbf	github.com/dgraph-io/badger.(*Txn).commitAndSend.func1+0x3f	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:501
#	0x9ae910	github.com/dgraph-io/badger.runTxnCallback+0x50			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:572
1 @ 0x43d600 0x44d15b 0x97485c 0x9ac2d0 0x9aedc7 0xa13f8c 0xa13f72 0xa14406 0xa16b74 0xa800ac 0x46b381
#	0x97485b	github.com/dgraph-io/badger/y.(*WaterMark).WaitForMark+0x13b	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/y/watermark.go:124
#	0x9ac2cf	github.com/dgraph-io/badger.(*oracle).readTs+0xdf		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:119
#	0x9aedc6	github.com/dgraph-io/badger.(*DB).newTransaction+0xc6		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:666
#	0xa13f8b	github.com/dgraph-io/badger.(*DB).NewTransaction+0x10b		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/txn.go:634
#	0xa13f71	go.badoo.dev/meetlist/db.(*DB).writeStatsEntry+0xf1		/local/eye/git/meetlist/db/db.go:154
#	0xa14405	go.badoo.dev/meetlist/db.(*DB).updateStats+0xb5			/local/eye/git/meetlist/db/db.go:181
#	0xa16b73	go.badoo.dev/meetlist/db.(*DB).AddRomancesFromSnapshot+0x8f3	/local/eye/git/meetlist/db/db.go:694
#	0xa800ab	main.snapshotWorker+0x24b					/local/eye/git/meetlist/main.go:617

There are some more routines with their own traces, dunno if they are related to the issue or not:

1 @ 0x43d600 0x44d15b 0x98b827 0x46b381
#	0x98b826	github.com/dgraph-io/badger.(*DB).doWrites+0x2d6	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:734

1 @ 0x43d600 0x44d15b 0x98d800 0x46b381
#	0x98d7ff	github.com/dgraph-io/badger.(*DB).updateSize+0x15f	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:1021

1 @ 0x43d600 0x44d15b 0x9a5eff 0x46b381
#	0x9a5efe	github.com/dgraph-io/badger.(*publisher).listenForUpdates+0x17e	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/publisher.go:67

1 @ 0x43d600 0x44dcc0 0x44dcab 0x44da27 0x47483c 0x9b8c6f 0x9b8aab 0x99c9ce 0x99e556 0x99ef97 0x99a1d9 0x46b381
#	0x44da26	sync.runtime_SemacquireMutex+0x46						/home/tony/go/src/runtime/sema.go:71
#	0x47483b	sync.(*Mutex).lockSlow+0xfb							/home/tony/go/src/sync/mutex.go:138
#	0x9b8c6e	sync.(*Mutex).Lock+0x1fe							/home/tony/go/src/sync/mutex.go:81
#	0x9b8aaa	github.com/dgraph-io/badger.(*valueLog).updateDiscardStats+0x3a			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1385
#	0x99c9cd	github.com/dgraph-io/badger.(*levelsController).compactBuildTables+0x208d	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:636
#	0x99e555	github.com/dgraph-io/badger.(*levelsController).runCompactDef+0xc5		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:791
#	0x99ef96	github.com/dgraph-io/badger.(*levelsController).doCompact+0x4b6			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:860
#	0x99a1d8	github.com/dgraph-io/badger.(*levelsController).runWorker+0x318			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:356

1 @ 0x43d600 0x44dcc0 0x44dcab 0x44da27 0x47483c 0x9b8f80 0x9b8cc0 0x9b8c02 0x99c9ce 0x99e556 0x99ef97 0x99a1d9 0x46b381
#	0x44da26	sync.runtime_SemacquireMutex+0x46						/home/tony/go/src/runtime/sema.go:71
#	0x47483b	sync.(*Mutex).lockSlow+0xfb							/home/tony/go/src/sync/mutex.go:138
#	0x9b8f7f	sync.(*Mutex).Lock+0x2ef							/home/tony/go/src/sync/mutex.go:81
#	0x9b8cbf	github.com/dgraph-io/badger.(*valueLog).flushDiscardStats+0x2f			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1403
#	0x9b8c01	github.com/dgraph-io/badger.(*valueLog).updateDiscardStats+0x191		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/value.go:1393
#	0x99c9cd	github.com/dgraph-io/badger.(*levelsController).compactBuildTables+0x208d	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:636
#	0x99e555	github.com/dgraph-io/badger.(*levelsController).runCompactDef+0xc5		/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:791
#	0x99ef96	github.com/dgraph-io/badger.(*levelsController).doCompact+0x4b6			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:860
#	0x99a1d8	github.com/dgraph-io/badger.(*levelsController).runWorker+0x318			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:356

1 @ 0x43d600 0x459807 0x4597dd 0x98af33 0x9bb65a 0x46b381
#	0x4597dc	time.Sleep+0x12c					/home/tony/go/src/runtime/time.go:105
#	0x98af32	github.com/dgraph-io/badger.(*DB).writeRequests+0x242	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:657
#	0x9bb659	github.com/dgraph-io/badger.(*DB).doWrites.func1+0x59	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:706

1 @ 0x43d600 0x459807 0x4597dd 0x99f7ab 0x98cf49 0x98d1bb 0x9bb1a7 0x46b381
#	0x4597dc	time.Sleep+0x12c							/home/tony/go/src/runtime/time.go:105
#	0x99f7aa	github.com/dgraph-io/badger.(*levelsController).addLevel0Table+0x37a	/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/levels.go:907
#	0x98cf48	github.com/dgraph-io/badger.(*DB).handleFlushTask+0x7b8			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:925
#	0x98d1ba	github.com/dgraph-io/badger.(*DB).flushMemtable+0x16a			/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:941
#	0x9bb1a6	github.com/dgraph-io/badger.Open.func4+0x36				/home/tony/go-packages/pkg/mod/github.com/dgraph-io/badger@v2.0.0-rc3+incompatible/db.go:303

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 18 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@tony2001 we found the issue. we’re working on the fix. Will update you after it gets merged. 😃