rust-synapse-compress-state: synapse_auto_compressor panics on start after using the history delete api to remove remote events

Describe the bug

Nov 25 00:25:08 nordgedanken.dev systemd[1]: Starting State Compress Synapse...
Nov 25 00:25:08 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:08Z INFO  synapse_auto_compressor] synapse_auto_compressor started
Nov 25 00:25:08 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:08Z INFO  synapse_auto_compressor::manager] Running compressor on room !OGEhHVWSdvArJzumhm:matrix.org with chunk size 1000
Nov 25 00:25:10 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:10Z ERROR panic] thread 'main' panicked at 'Missing 13719240': src/lib.rs:666
Nov 25 00:25:10 nordgedanken.dev systemd[1]: synapse-state-compress.service: Main process exited, code=exited, status=101/n/a
Nov 25 00:25:10 nordgedanken.dev systemd[1]: synapse-state-compress.service: Failed with result 'exit-code'.
Nov 25 00:25:10 nordgedanken.dev systemd[1]: Failed to start State Compress Synapse.

To Reproduce Steps to reproduce the behavior:

  1. Run purge_history on a room
  2. start synapse_auto_compressor

Expected behavior It runs or atleast ignores the invalid rooms

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 9
  • Comments: 15 (2 by maintainers)

Commits related to this issue

Most upvoted comments

I’m always calling the following script in preparation of a compressor run:

#!/bin/bash

# Clean up synapse_auto_compressor state tables.
#
# The compressor doesn't take into account
# - deleted rooms
# - state groups which got deleted either as unreferenced 
#   or due to retention time
#
# Procedure:
# 1. Delete progress related to deleted rooms
# 2. Delete progress for rooms where one of the referenced state groups
#    no longer exist
# 3. Replicate changes from state_compressor_state to state_compressor_progress

set -e -u -o pipefail

export PGHOST=$1
export PGDATABASE=$2
export PGUSER=$3
export PGPASSWORD=$4

SCRIPT=$(basename $0)

echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ') INFO ${SCRIPT}]" Starting

psql <<_EOF
BEGIN;

DELETE
FROM state_compressor_state AS scs
WHERE NOT EXISTS
    (SELECT *
     FROM rooms AS r
     WHERE r.room_id = scs.room_id);

DELETE
FROM state_compressor_state AS scs
WHERE scs.room_id in
    (SELECT DISTINCT room_id
     FROM state_compressor_state AS scs2
     WHERE scs2.current_head IS NOT NULL
       AND NOT EXISTS
         (SELECT *
          FROM state_groups AS sg
          WHERE sg.id = scs2.current_head));

DELETE
FROM state_compressor_progress AS scp
WHERE NOT EXISTS
    (SELECT *
     FROM state_compressor_state AS scs
     WHERE scs.room_id = scp.room_id);

COMMIT;
_EOF

echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ') INFO ${SCRIPT}]" Finished

I’m not sure if this is helpful, but deleting the relevant room entries into state_compressor_state and state_compressor_progress. At least made the auto compressor resume working for me.

Thanks for trying that—I wonder what’s happened there?