risingwave: bug: compactor block builder panics on assertion failed

https://github.com/singularity-data/risingwave/blob/001f5ac22ca46aae2e2df3dc265e4d0fc6c12d81/src/storage/src/hummock/sstable/block.rs#L227-L247

thread 'tokio-runtime-worker' panicked at 'assertion failed: `(left == right)`
  left: `Equal`,
 right: `Less`', src/storage/src/hummock/sstable/block.rs:243:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/panicking.rs:181:5
   4: risingwave_storage::hummock::sstable::block::BlockBuilder::add
             at ./src/storage/src/hummock/sstable/block.rs:243:13
   5: risingwave_storage::hummock::sstable::builder::SstableBuilder::add
             at ./src/storage/src/hummock/sstable/builder.rs:130:9
   6: risingwave_storage::hummock::sstable::multi_builder::CapacitySplitTableBuilder<F>::add_full_key::{{closure}}
             at ./src/storage/src/hummock/sstable/multi_builder.rs:121:9
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
   8: risingwave_storage::hummock::compactor::Compactor::compact_and_build_sst::{{closure}}
             at ./src/storage/src/hummock/compactor.rs:1021:92
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
  10: risingwave_storage::hummock::compactor::Compactor::compact_key_range_impl::{{closure}}
             at ./src/storage/src/hummock/compactor.rs:716:10
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
  12: risingwave_storage::hummock::compactor::Compactor::compact_key_range_with_filter::{{closure}}
             at ./src/storage/src/hummock/compactor.rs:782:74
  13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
  14: risingwave_storage::hummock::compactor::Compactor::compact::{{closure}}::{{closure}}
             at ./src/storage/src/hummock/compactor.rs:556:84
  15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
  16: risingwave_storage::hummock::compaction_executor::CompactionExecutor::send_request::{{closure}}
             at ./src/storage/src/hummock/compaction_executor.rs:76:27
  17: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/mod.rs:91:19
  18: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/future/future.rs:124:9
  19: <tokio::task::task_local::TaskLocalFuture<T,F> as core::future::future::Future>::poll::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/task/task_local.rs:345:35
  20: tokio::task::task_local::LocalKey<T>::scope_inner
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/task/task_local.rs:231:19
  21: <tokio::task::task_local::TaskLocalFuture<T,F> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/task/task_local.rs:342:13
  22: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.35/src/instrument.rs:272:9
  23: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/core.rs:165:17
  24: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/loom/std/unsafe_cell.rs:14:9
  25: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/core.rs:155:13
  26: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:480:19
  27: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/panic/unwind_safe.rs:271:9
  28: std::panicking::try::do_call
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panicking.rs:492:40
  29: ___rust_try
  30: std::panicking::try
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panicking.rs:456:19
  31: std::panic::catch_unwind
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panic.rs:137:14
  32: tokio::runtime::task::harness::poll_future
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:468:18
  33: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:104:27
  34: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:57:15
  35: tokio::runtime::task::raw::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/raw.rs:144:5
  36: tokio::runtime::task::raw::RawTask::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/raw.rs:84:18
  37: tokio::runtime::task::LocalNotified<S>::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/mod.rs:381:9
  38: tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:435:13
  39: tokio::coop::with_budget::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/coop.rs:102:9
  40: std::thread::local::LocalKey<T>::try_with
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/thread/local.rs:445:16
  41: std::thread::local::LocalKey<T>::with
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/thread/local.rs:421:9
  42: tokio::coop::with_budget
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/coop.rs:95:5
  43: tokio::coop::budget
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/coop.rs:72:5
  44: tokio::runtime::thread_pool::worker::Context::run_task
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:434:9
  45: tokio::runtime::thread_pool::worker::Context::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:401:24
  46: tokio::runtime::thread_pool::worker::run::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:386:17
  47: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/macros/scoped_tls.rs:61:9
  48: tokio::runtime::thread_pool::worker::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:383:5
  49: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/thread_pool/worker.rs:362:45
  50: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/blocking/task.rs:42:21
  51: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.35/src/instrument.rs:272:9
  52: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/core.rs:165:17
  53: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/loom/std/unsafe_cell.rs:14:9
  54: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/core.rs:155:13
  55: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:480:19
  56: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/core/src/panic/unwind_safe.rs:271:9
  57: std::panicking::try::do_call
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panicking.rs:492:40
  58: ___rust_try
  59: std::panicking::try
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panicking.rs:456:19
  60: std::panic::catch_unwind
             at /rustc/9067d5277d10f0f32a49ec9c125a33828e26a32b/library/std/src/panic.rs:137:14
  61: tokio::runtime::task::harness::poll_future
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:468:18
  62: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:104:27
  63: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/harness.rs:57:15
  64: tokio::runtime::task::raw::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/raw.rs:144:5
  65: tokio::runtime::task::raw::RawTask::poll
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/raw.rs:84:18
  66: tokio::runtime::task::UnownedTask<S>::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/task/mod.rs:418:9
  67: tokio::runtime::blocking::pool::Task::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/blocking/pool.rs:91:9
  68: tokio::runtime::blocking::pool::Inner::run
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/blocking/pool.rs:325:17
  69: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /Users/bugenzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/runtime/blocking/pool.rs:300:13

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Besides this bug, should we add kind of assertion that ensures the assumption “CNs never write same key with each other in one epoch”, so that when similar issues happen, we can narrow down the scope of investigation.

With such sanity check , duplicate key originated from different CNs is spotted in source executor.

compactor-0 | 2022-08-22T06:23:18.554422Z ERROR risingwave_storage::hummock::validator: Duplicate key [74, 0, 0, 3, eb, 32, 34, 2d, 30, ff, f5, c6, 82, 89, de, ff, ff] in SST 62 from worker 3 and SST 42 from worker 1

The hex ‘32, 34, 2d, 30’ in duplicate key is ‘24-0’, that is https://github.com/singularity-data/risingwave/blob/17edbcfe73d7f40cc74713811ec7827c1c40b776/src/connector/src/source/nexmark/split.rs#L31

So there is something wrong with the state of Source? cc. @shanicky @tabVersion

Hi @hzxa21 @BugenZhao , any updates?

Besides this bug, should we add kind of assertion that ensures the assumption “CNs never write same key with each other in one epoch”, so that when similar issues happen, we can narrow down the scope of investigation.

+1. We should assert it somewhere explicitly instead of panicking on usage.

I think I identified the problem. I add some log here https://github.com/singularity-data/risingwave/blob/8abbb65c4414ac08bef57cae325d93885291568c/src/meta/src/stream/source_manager.rs#L561-L573

and I got the following log

{52: ['24-18', '24-7'], 56: ['24-2', '24-5'], 36: ['24-13', '24-8'], 49: ['24-13', '24-3'], 27: ['24-0', '24-16'], 51: ['24-14', '24-23'], 28: ['24-19', '24-17'], 58: ['24-16', '24-21'], 53: ['24-15', '24-1'], 25: ['24-9', '24-14'], 60: ['24-19', '24-12'], 35: ['24-5', '24-11'], 33: ['24-1', '24-7'], 59: ['24-11', '24-8'], 50: ['24-0', '24-6'], 54: ['24-20', '24-17'], 26: ['24-23', '24-15'], 29: ['24-20', '24-2'], 31: ['24-21', '24-12'], 30: ['24-18', '24-3'], 55: ['24-22', '24-4'], 57: ['24-9', '24-10'], 32: ['24-10', '24-6'], 34: ['24-22', '24-4']}
==== build reverse index ====
{'24-18': [52, 30], '24-7': [52, 33], '24-2': [56, 29], '24-5': [56, 35], '24-13': [36, 49], '24-8': [36, 59], '24-3': [49, 30], '24-0': [27, 50], '24-16': [27, 58], '24-14': [51, 25], '24-23': [51, 26], '24-19': [28, 60], '24-17': [28, 54], '24-21': [58, 31], '24-15': [53, 26], '24-1': [53, 33], '24-9': [25, 57], '24-12': [60, 31], '24-11': [35, 59], '24-6': [50, 32], '24-20': [54, 29], '24-22': [55, 34], '24-4': [55, 34], '24-10': [57, 32]}

we can see each SplitId is assigned to two actors, which may on different CNs.

The root cause for the bug is that SplitId is not unique in global

btw: after the crash, CN will use up all CPU resources, it is expected?

It seems that some key with the same epoch are written to builder by twice?