risingwave: scaling test panicked at 'overwrites an existing key'

Describe the bug

--- STDERR:              risingwave_simulation::nexmark_recovery nexmark_recovery_q103 ---
--
  | thread '<unnamed>' panicked at 'overwrites an existing key!
  | table_id: 1015, vnode: 137, key: OwnedRow([Some(Int64(2100))])
  | value in storage: OwnedRow([Some(Int64(2100))])
  | value to write: OwnedRow([Some(Int64(2100))])', /risingwave/src/stream/src/common/table/state_table.rs:875:13

Found in #7623. https://buildkite.com/risingwavelabs/pull-request/builds/18318#018691bc-bbb4-40e2-88d4-f4c9e1f52db2/116-197

To Reproduce

No response

Expected behavior

No response

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 4
  • Comments: 17 (16 by maintainers)

Most upvoted comments

cc. @zwang28 PTAL

I add

                    self.side_l.ht.clear();
                    self.side_r.ht.clear();

after barrier and it passed.

                AlignedMessage::Barrier(barrier) => {
                    let barrier_start_time = minstant::Instant::now();
                    self.flush_data(barrier.epoch).await?;

                    self.side_l.ht.clear();
                    self.side_r.ht.clear();


    pub fn clear(&mut self) {
        self.inner.clear();
    }

cc @yuhao-su @hzxa21

Is it possible that a vnode is moved back and forth multiple times for an actor and the operator cache for the vnode is stale?

No, operator cache will clear upon vnode change

I add

                    self.side_l.ht.clear();
                    self.side_r.ht.clear();

after barrier and it passed.

                AlignedMessage::Barrier(barrier) => {
                    let barrier_start_time = minstant::Instant::now();
                    self.flush_data(barrier.epoch).await?;

                    self.side_l.ht.clear();
                    self.side_r.ht.clear();


    pub fn clear(&mut self) {
        self.inner.clear();
    }

cc @yuhao-su @hzxa21

I also so tried to commit a change the following config to in-memory and the test passed, so is it possible there is a hummock bug? Cc @hzxa21 @wenym1 , PTAL.

https://github.com/singularity-data/risingwave/blob/f671f09be0bcaa045c2812c444486dec1323e44f/src/tests/simulation/src/cluster.rs#L247

Cc @yuhao-su, PTAL.