pebble: db: read-triggered compactions causing excessive write amplification
We observed the following in the CockroachDB telemetry cluster when switching from 20.2 to 21.1: the L4 shrank in size from 2GB to 120MB and from then one we see hugely imbalanced compactions with tiny bytes in L4 and huge bytes in L5 (see example below). These keep happening despite L4 score being low and compactions into L4 only adding ~30MB every ~3min to L4. The likely cause is read-triggered compactions. The write amp keeps growing to significantly > 100.
(internal links. more details: https://cockroachlabs.atlassian.net/browse/SREOPS-2002, discussion: https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1621008284293900)
- We need the ability to change the
ReadSamplingMultiplier
for Pebble running in a CockroachDB node. To do this,pebble.Options.Parse
needs to support this option. - The compactionPicker should log the reason for choosing a compaction. Even the current limited logging does not show up in the logs, so we may have a plumbing problem too https://github.com/cockroachdb/pebble/blob/3ebefb1715efc7e9a3f8a4b7a6deb1ae9c62e3e5/compaction_picker.go#L942
- We need to better understand why we have such aggressive read-triggered compactions. The workload is not necessarily read heavy though there are periodic backups that could be reading a lot. It is possible that we have a bug where we are suggesting read triggered compactions when we shouldn’t.
- Even if there is a suggestion to do a read-triggered compactions, imbalanced levels like the ones here should block the compaction from happening.
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + WAL 1 21 M - 31 G - - - - 32 G - - - 1.0
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 0 4 7.3 M 2.22 32 G 39 M 3 0 B 0 4.3 G 2.5 K 0 B 1 0.1
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 2 3 12 M 0.45 4.3 G 26 M 1 16 M 8 20 G 7.7 K 21 G 1 4.6
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 3 4 23 M 0.21 3.1 G 0 B 0 329 M 87 28 G 11 K 29 G 1 9.1
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 4 253 2.0 G 0.16 2.7 G 0 B 0 550 M 95 110 G 12 K 110 G 1 41.0
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 5 1247 33 G 1.07 4.6 G 10 K 8 471 M 84 95 G 3.5 K 96 G 1 20.8
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + 6 5443 278 G - 4.8 G 0 B 0 0 B 0 30 G 540 32 G 1 6.2
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + total 6954 313 G - 32 G 65 M 12 1.3 G 274 320 G 37 K 287 G 6 10.0
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + flush 573
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 +compact 3668 20 M 5.0 G (size == estimated-debt, in = in-progress-bytes)
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + memtbl 1 64 M
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 +zmemtbl 0 0 B
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + ztbl 0 0 B
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + bcache 186 K 3.0 G 82.6% (score == hit-rate)
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + tcache 6.8 K 4.0 M 100.0% (score == hit-rate)
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + titers 9
I210514 10:59:46.038948 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5483737 + filter - - 91.5% (score == utility)
shrinks
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + WAL 1 1.7 M - 32 G - - - - 33 G - - - 1.0
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 0 8 14 M 3.11 33 G 39 M 3 0 B 0 4.4 G 2.5 K 0 B 2 0.1
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 2 3 12 M 0.64 4.4 G 26 M 1 16 M 8 20 G 7.7 K 21 G 1 4.6
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 3 7 39 M 0.49 3.1 G 0 B 0 364 M 96 28 G 11 K 29 G 1 9.1
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 4 195 120 M 0.16 2.7 G 0 B 0 550 M 95 110 G 12 K 110 G 1 40.5
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 5 1146 33 G 1.00 6.5 G 10 K 8 471 M 84 123 G 4.4 K 123 G 1 19.0
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + 6 5438 280 G - 6.7 G 0 B 0 0 B 0 40 G 701 42 G 1 5.9
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + total 6797 313 G - 33 G 65 M 12 1.4 G 283 358 G 38 K 325 G 7 11.0
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + flush 583
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 +compact 3750 26 M 4.7 G (size == estimated-debt, in = in-progress-bytes)
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + memtbl 1 64 M
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 +zmemtbl 3 192 M
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + ztbl 124 2.6 G
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + bcache 168 K 2.8 G 82.0% (score == hit-rate)
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + tcache 6.8 K 4.0 M 100.0% (score == hit-rate)
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + titers 37
I210514 11:09:46.079775 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 5586217 + filter - - 91.5% (score == utility)
imbalanced compaction:
cockroach-pebble.cockroachdb-0.root.2021-05-14T08_44_37Z.000012.log:I210514 12:30:14.020728 39821675 3@vendor/github.com/cockroachdb/pebble/compaction.go:1881 ⋮ [n1,pebble] 111993 [JOB 9125] compacted L4 [18347336 18347337 18347338 18347339 18347340 18347341 18347342 18347343 18347344 18347345 18347346 18347347 18347348 18347349 18347350 18347351 18347352 18347353 18347354 18347355 18347356 18347357 18347358 18347359 18347360 18347361 18347362 18347363 18347364 18347365 18347366 18347367 18347368 18347369 18347374 18347375 18347376 18347377 18347378 18347379] (12 M) + L5 [18346572 18346573 18346574 18346575 18346576 18346577 18346578 18346579 18346580 18346581 18346582 18346583 18346584 18346585 18346586 18346587 18346588 18346589 18346590 18346591 18346592 18346593 18346594 18346595 18346596 18346597 18346598 18346599 18346600 18346601 18346602 18346603 18346604 18346605 18346606 18346607 18346608 18346609 18346610 18346611 18346612 18346613 18346614 18346615 18346616 18346617 18346618 18346619 18346620 18346621 18346622 18346623 18346624 18346625 18346626 18346627 18346628 18346629 18346630 18346631 18346632 18346633 18346634 18346635 18346636 18346637 18346638 18346639 18346640 18346641 18346642 18346643 18346644 18346645 18346646 18346647 18346648 18346649 18346650 18346651 18346652 18346653 18346654 18346655 18346656 18346657 18346658 18346659 18346660 18346661 18346662 18346663 18346664 18346665 18346666 18346667 18346668 18346669 18346670 18346671 18346672 18346673 18346674 18346675 18346676 18346677 18346678 18346679 18346680 18346681 18346682 18346683 18346684 18346685 18346686 18346687 18346688 18346689 18346690 18346691 18346692 18346693 18346694 18346695 18346696 18346697 18346698 18346699 18346700 18346701 18346702 18346703 18346704 18346705 18346706 18346707 18346708 18346709 18346710 18346711 18346712 18346713 18346714 18346715 18346716 18346717 18346718 18346719 18346720 18346721 18346722 18346723 18345093 18345094 18345095 18345096 18345097 18345098 18345099 18345100 18345101 18345102 18345103 18345104 18345110 18345111 18345112 18345113 18345114 18345115 18345116 18345117 18345118 18345119 18345120 18345121 18345122 18345123 18345124 18345125 18345126 18346983 18346984 18346985 18346986 18346987 18346988 18346989 18346990 18346991 18346992 18346993 18346994 18346995 18346996 18346997 18346998 18346999 18347000 18347001 18347002 18347003 18347004 18347005 18347006 18347007 18347008 18347009 18347010 18347011 18347012 18347013 18347014 18347015 18347016 18347017 18347018 18347030 18347031 18347032 18347033 18347034 18347035 18347036 18347037 18347038 18347039 18347040 18347041 18347042 18347043 18347044 18347045 18347046 18347047 18347048 18347049 18347050 18347051 18347052 18347053 18347054 18347055 18347056 18347057 18347058 18347059 18347060 18347061 18347062 18347063 18347064 18347065 18347066 18347067 18347068 18347069 18347070] (6.7 G) -> L5 [18347433 18347434 18347435 18347436 18347437 18347438 18347439 18347440 18347441 18347442 18347443 18347444 18347445 18347446 18347447 18347448 18347449 18347450 18347451 18347452 18347453 18347454 18347455 18347456 18347457 18347458 18347459 18347460 18347461 18347462 18347463 18347464 18347465 18347466 18347467 18347468 18347469 18347470 18347471 18347472 18347473 18347474 18347475 18347476 18347477 18347478 18347479 18347480 18347481 18347482 18347483 18347484 18347485 18347486 18347487 18347488 18347489 18347490 18347491 18347492 18347493 18347494 18347495 18347496 18347497 18347498 18347499 18347500 18347501 18347502 18347503 18347504 18347505 18347506 18347507 18347508 18347509 18347510 18347511 18347512 18347513 18347514 18347515 18347516 18347517 18347518 18347519 18347520 18347521 18347522 18347523 18347524 18347525 18347526 18347527 18347528 18347529 18347530 18347531 18347532 18347533 18347534 18347535 18347536 18347537 18347538 18347539 18347540 18347541 18347542 18347543 18347544 18347545 18347546 18347547 18347548 18347549 18347550 18347551 18347552 18347553 18347554 18347555 18347556 18347557 18347558 18347559 18347560 18347561 18347562 18347569 18347570 18347571 18347572 18347573 18347574 18347575 18347576 18347577 18347578 18347579 18347580 18347581 18347582 18347583 18347584 18347585 18347586 18347587 18347588 18347589 18347590 18347591 18347592 18347593 18347594 18347595 18347596 18347597 18347598 18347599 18347600 18347601 18347602 18347603 18347604 18347605 18347606 18347607 18347608 18347609 18347610 18347611 18347612 18347613 18347614 18347615 18347616 18347617 18347618 18347619 18347620 18347621 18347622 18347623 18347624 18347625 18347626 18347627 18347628 18347629 18347630 18347631 18347632 18347633 18347634 18347635 18347636 18347637 18347638 18347639 18347640 18347641 18347642 18347643 18347644 18347645 18347646 18347647 18347648 18347649 18347650 18347651 18347652 18347653 18347654 18347655 18347656 18347657 18347658 18347659 18347660 18347661 18347662 18347663 18347664 18347665 18347666 18347667 18347668 18347669 18347670 18347671 18347672 18347673 18347674 18347675 18347676 18347677 18347678 18347679 18347680 18347681 18347682 18347683 18347684 18347685 18347686 18347687 18347688 18347689 18347690 18347691 18347692 18347693 18347694 18347695 18347696 18347697] (6.7 G), in 140.3s, output rate 49 M/s
growth in write amp:
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + WAL 1 3.9 M - 42 G - - - - 43 G - - - 1.0
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 0 9 14 M 3.45 43 G 39 M 4 0 B 0 5.7 G 3.4 K 0 B 2 0.1
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 2 4 17 M 0.58 5.7 G 26 M 1 18 M 9 22 G 8.4 K 24 G 1 3.9
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 3 13 63 M 0.57 3.7 G 0 B 0 790 M 219 30 G 11 K 30 G 1 8.1
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 4 193 94 M 0.14 3.5 G 157 M 4 554 M 98 113 G 21 K 113 G 1 32.0
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 5 1154 33 G 1.01 7.4 G 12 K 9 471 M 84 874 G 31 K 874 G 1 118.4
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + 6 5450 280 G - 7.4 G 0 B 0 0 B 0 46 G 823 49 G 1 6.3
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + total 6823 314 G - 43 G 222 M 18 1.8 G 410 1.1 T 75 K 1.1 T 7 26.3
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + flush 768
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 +compact 4291 31 M 7.7 G (size == estimated-debt, in = in-progress-bytes)
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + memtbl 1 64 M
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 +zmemtbl 0 0 B
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + ztbl 0 0 B
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + bcache 183 K 3.0 G 74.6% (score == hit-rate)
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + tcache 6.5 K 3.8 M 100.0% (score == hit-rate)
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + titers 2
I210514 14:59:46.106086 20815 kv/kvserver/store.go:2662 ⋮ [n1,s1] 7432754 + filter - - 92.5% (score == utility)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (12 by maintainers)
Commits related to this issue
- *: Add ReadCompactionAdded event to event listener This change, which is *not* meant for merging into a release or master branch, is solely to add a logging event for when a read compaction is append... — committed to cockroachdb/pebble by itsbilal 3 years ago
- *: Log info about read compactions This change, which is *not* meant for merging into a release or master branch, is solely to add logging events when a read compaction is appended to the db's slice,... — committed to itsbilal/pebble by itsbilal 3 years ago
- vendor: Bump pebble to 85e0b39a8d1b695815781eb1f1fe006179177914 This change is not destined for any release. It's solely to add logging to inform https://github.com/cockroachdb/pebble/issues/1143 Pu... — committed to cockroachdb/cockroach by itsbilal 3 years ago
- storage: Disable read sampling and read-triggered compactions This change sets the ReadSamplingMultiplier to a negative value, effectively disabling read sampling and therefore read-triggered compact... — committed to itsbilal/cockroach by itsbilal 3 years ago
- storage: Disable read sampling and read-triggered compactions This change sets the ReadSamplingMultiplier to a negative value, effectively disabling read sampling and therefore read-triggered compact... — committed to cockroachdb/cockroach by itsbilal 3 years ago
- storage: Disable read sampling and read compactions Read-triggered compactions are already disabled on 21.1. As the fixes to address known shortcomings with read-triggered compactions are a bit invol... — committed to itsbilal/cockroach by itsbilal 3 years ago
- Merge #65559 #66149 #66152 #66155 #66157 65559: tracing,tracingservice: adds a trace service to pull clusterwide trace spans r=irfansharif,abarganier a=adityamaru Previously, every node in the clust... — committed to cockroachdb/cockroach by deleted user 3 years ago
- compaction: read compaction fixes We were seeing some issues with read compactions as described here: https://github.com/cockroachdb/pebble/issues/1143 To prevent the issues, we came up with the fol... — committed to cockroachdb/pebble by bananabrick 3 years ago
you are right, I misread! I think we need to log whenever we append to that slice with the key interval and level. And also log when removing from that slice both the original level and what level was actually picked for the compaction. I still suspect that the large number of L4 files suggests the original addition was for a higher level. Maybe we are seeing enough file churn in L0 that the file changes, which would allow for repeated addition of the same key interval.