badger: GC doesn't work? (not cleaning up SST files properly)
What version of Go are you using (go version)?
$ go version 1.13.8
What version of Badger are you using?
v1.6.0
opts := badger.DefaultOptions(fmt.Sprintf(dir + “/” + name)) opts.SyncWrites = false opts.ValueLogLoadingMode = options.FileIO
Does this issue reproduce with the latest master?
With the latest master GC becomes much slower
What are the hardware specifications of the machine (RAM, OS, Disk)?
2TB NVME drive, 128 GB RAM
What did you do?
I have a Kafka topic with 12 partitions. For every partition I create a database. Each database grows quite quickly (about 12*30GB per hour) and the TTL for most of the events is 1h, so the size should stay at constant level. Now for every partition I create a separate transaction and I process read and write operations sequentially, there is no concurrency, when the transaction is getting to big I commit it, in separate go-routine I start RunValueLogGC(0.5). Most of GC runs end up with ErrNoRewrite. Even tried to repeat RunValueLogGC until I have 5 errors in the row, but still I was running out of disk space quite quickly. My current fix is to patch the Badger GC, make it run on every fid that is before the head. This works fine, but eventually becomes slow when I have too many log files.
What did you expect to see?
The size of each of twelve databases I created, should stay at constant level and has less then 20 GB
What did you see instead?
After running it for a day, if I look at one of twelve databases, I see 210 sst files, 68 vlog files, db size is 84 GB (and these numbers keep growing).
If I run badger histogram it shows me this stats:
Histogram of key sizes (in bytes) Total count: 4499955 Min value: 13 Max value: 108 Mean: 22.92 Range Count [ 8, 16) 2 [ 16, 32) 4499939 [ 64, 128) 14
Histogram of value sizes (in bytes) Total count: 4499955 Min value: 82 Max value: 3603 Mean: 2428.16 Range Count [ 64, 128) 1 [ 256, 512) 19301 [ 512, 1024) 459 [ 1024, 2048) 569 [ 2048, 4096) 4479625
2428*4479625=10GB
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (14 by maintainers)
@jarifibrahim can you point us to the correct setting for letting BadgerDB clean deleted keys correctly? My understanding is that https://github.com/dgraph-io/badger/pull/1300 was closed because setting
NumVersionsToKeepto zero results actually in infinite version to be kept around. So what does one have to define in order to let BadgerDB clean deleted keys?@crphang Actually, the community fixed it. The main PRs that fixed the GC issues were from community users 🎉
@jarifibrahim really great work on the fixes, are there any plans for another release soon? I am helping maintain TalariaDB and have checked that latest master resolved many of the issues our users have been facing with GC of value log. Great work on that 🎉.