patroni: [Postger error panic] Loop recover mode db when postgres could not flush dirty data: Structure needs cleaning

What happened?

This issue was found given the following scenario:

Node3 is the leader, Node1, 2 is the replica

+-----------------------+--------------------+---------+---------+----+-----------+
| Member                | Host               | Role    | State   | TL | Lag in MB |
+ Cluster: postgres-cluster (7145069559047319590) --+---------+----+-----------+
| patroni-IP_1              | IP_1:8432 | Replica | running | 92 |         0 |
| patroni-IP_2              | IP_2:8432 | Replica | running | 92 |         0 |
| patroni-IP_3              | IP_3:8432 | Leader  | running | 94 |           |
+-----------------------+--------------------+---------+---------+----+-----------+

Node3 crash error panic maybe relate to some querry db error (status node Leader from running => crash => running => stuck crash).

+-----------------------+--------------------+---------+---------+----+-----------+
| Member                | Host               | Role    | State   | TL | Lag in MB |
+ Cluster: postgres-cluster (7145069559047319590) --+---------+----+-----------+
| patroni-IP_1              | IP_1:8432 | Replica | running | 92 |         0 |
| patroni-IP_2              | IP_2:8432 | Replica | running | 92 |         0 |
| patroni-IP_3              | IP_3:8432 | Leader  | crash | |           |
+-----------------------+--------------------+---------+---------+----+-----------+

Log error:

2023-05-17 06:41:52.760 UTC [2514] PANIC:  could not flush dirty data: Structure needs cleaning
2023-05-17 06:41:52.986 UTC [31] LOG:  checkpointer process (PID 2514) was terminated by signal 6: Aborted
2023-05-17 06:41:52.986 UTC [31] LOG:  terminating any other active server processes

Patroni check node leader unhealthy => Patroni automatic recover postgres

    def _run_cycle(self):
        dcs_failed = False
        # ...
            if not self.state_handler.is_healthy():
                if self.is_paused():
                    self.state_handler.set_state('stopped')
                    if self.has_lock():
                        self._delete_leader()
                        return 'removed leader lock because postgres is not running'
                    # Normally we don't start Postgres in a paused state. We make an exception for the demoted primary
                    # that needs to be started after it had been stopped by demote. When there is no need to call rewind
                    # the demote code follows through to starting Postgres right away, however, in the rewind case
                    # it returns from demote and reaches this point to start PostgreSQL again after rewind. In that
                    # case it makes no sense to continue to recover() unless rewind has finished successfully.
                    elif self._rewind.failed or not self._rewind.executed and not \
                            (self._rewind.is_needed and self._rewind.can_rewind_or_reinitialize_allowed):
                        return 'postgres is not running'

                if self.state_handler.state in ('running', 'starting'):
                    self.state_handler.set_state('crashed')
                # try to start dead postgres
                return self.recover()

When recover mode, patroni check var master_start_timeout. If master_start_timeout >0 patroni restart postgres recover master node but in my case node 3 crash should happen restart continuously.

    def recover(self):
        # Postgres is not running and we will restart in standby mode. Watchdog is not needed until we promote.
        self.watchdog.disable()

        if self.has_lock() and self.update_lock():
            timeout = self.patroni.config['master_start_timeout']
            if timeout == 0:
                # We are requested to prefer failing over to restarting master. But see first if there
                # is anyone to fail over to.
                if self.is_failover_possible(self.cluster.members):
                    logger.info("Master crashed. Failing over.")
                    self.demote('immediate')
                    return 'stopped PostgreSQL to fail over after a crash'
        else:
            timeout = None

        data = self.state_handler.controldata()
        logger.info('pg_controldata:\n%s\n', '\n'.join('  {0}: {1}'.format(k, v) for k, v in data.items()))
        if data.get('Database cluster state') in ('in production', 'shutting down', 'in crash recovery'):
            msg = self._handle_crash_recovery()
            if msg:
                return msg

    def _handle_crash_recovery(self):
        if not self._crash_recovery_executed and (self.cluster.is_unlocked() or self._rewind.can_rewind):
            self._crash_recovery_executed = True
            self._crash_recovery_started = time.time()
            msg = 'doing crash recovery in a single user mode'
            return self._async_executor.try_run_async(msg, self._rewind.ensure_clean_shutdown) or msg

    def single_user_mode(self, communicate=None, options=None):
        """run a given command in a single-user mode. If the command is empty - then just start and stop"""
        cmd = [self._postgresql.pgcommand('postgres'), '--single', '-D', self._postgresql.data_dir]
        for opt, val in sorted((options or {}).items()):
            cmd.extend(['-c', '{0}={1}'.format(opt, val)])
        # need a database name to connect
        cmd.append('template1')
        return self._postgresql.cancellable.call(cmd, communicate=communicate)

    def ensure_clean_shutdown(self):
        self.cleanup_archive_status()

        # Start in a single user mode and stop to produce a clean shutdown
        opts = self.read_postmaster_opts()
        opts.update({'archive_mode': 'on', 'archive_command': 'false'})
        self._postgresql.config.remove_recovery_conf()
        output = {}
        ret = self.single_user_mode(communicate=output, options=opts)
        if ret != 0:
            logger.error('Crash recovery finished with code=%s', ret)
            logger.info(' stdout=%s', output['stdout'].decode('utf-8'))
            logger.info(' stderr=%s', output['stderr'].decode('utf-8'))
        return ret == 0 or None

How can we reproduce it (as minimally and precisely as possible)?

Maybe relate to some query db error => crash db

What did you expect to happen?

Patroni perform failover postgres when postgres crash in case master_start_timeout >0

Patroni/PostgreSQL/DCS version

Patroni version: 2.1.4
PostgreSQL version: 13.6
DCS (and its version): etcd:v3.5.2

Patroni configuration file

scope: postgres-cluster
name: patroni-IP_3
restapi:
    listen: IP_3:8000
    connect_address: IP_3:8000

etcd:
    hosts: IP_1:2379,IP_2:2379,IP_3:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 0.0.0.0/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: IP_3:8432
    connect_address: IP_3:8432
    data_dir: /path_data
    pgpass: /path_pass
    authentication:
        replication:
            username: postgres
            password: 
        superuser:
            username: postgres
            password: 
        rewind:  # Has no effect on postgres 10 and lower
            username: postgres
            password: 
    parameters:
        unix_socket_directories: /path_data
        archive_mode: off
        logging_collector: true
        log_directory: /path_data
        log_filename: postgres.log
        log_rotation_size: 50000
        log_truncate_on_rotation: true
log:
    level: INFO
    dir: /path_data
    file_num: 1

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

Done

Patroni log files

2023-05-17 14:06:11,930 INFO: Lock owner: patroni-IP_3; I am patroni-IP_3
2023-05-17 14:06:11,940 INFO: starting as readonly because i had the session lock
2023-05-17 14:06:11,942 INFO: closed patroni connection to the postgresql cluster
2023-05-17 14:06:12,147 INFO: postmaster pid=8985
2023-05-17 14:06:15,427 INFO: Lock owner: patroni-IP_3; I am patroni-IP_3
2023-05-17 14:06:15,427 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 14:06:15,472 INFO: promoted self to leader because I had the session lock
2023-05-17 14:06:15,477 INFO: cleared rewind state after becoming the leader
2023-05-17 14:06:16,581 INFO: no action. I am (patroni-IP_3), the leader with the lock
2023-05-17 14:06:16,778 ERROR: Exception during CHECKPOINT
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/__init__.py", line 606, in checkpoint
    cur.execute('CHECKPOINT')
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2023-05-17 14:06:26,497 WARNING: Postgresql is not running.
2023-05-17 14:06:26,497 INFO: Lock owner: patroni-IP_3; I am patroni-IP_3
2023-05-17 14:06:26,516 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7145069559047319590
  Database cluster state: shutting down
  pg_control last modified: Wed May 17 14:06:18 2023
  Latest checkpoint location: 4/8A4117B8
  Latest checkpoint's REDO location: 4/8A3150D8
  Latest checkpoint's REDO WAL file: 0000001E000000040000008A
  Latest checkpoint's TimeLineID: 30
  Latest checkpoint's PrevTimeLineID: 30
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:9182574
  Latest checkpoint's NextOID: 216022
  Latest checkpoint's NextMultiXactId: 375408
  Latest checkpoint's NextMultiOffset: 1532251
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 9182574
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Wed May 17 13:36:33 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 4/8A5D5128
  Min recovery ending loc's timeline: 93
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 400
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: f297655fad1846ebdda6a1ce18a2589b32c8d612deb1339e34d5a18480258962

2023-05-17 14:06:26,547 INFO: doing crash recovery in a single user mode
2023-05-17 14:06:28,879 ERROR: Crash recovery finished with code=-6
2023-05-17 14:06:28,879 INFO:  stdout=
2023-05-17 14:06:28,879 INFO:  stderr=2023-05-17 07:06:26.685 UTC [9025] LOG:  database system shutdown was interrupted; last known up at 2023-05-17 07:06:18 UTC
2023-05-17 07:06:28.587 UTC [9025] LOG:  database system was not properly shut down; automatic recovery in progress
2023-05-17 07:06:28.587 UTC [9025] LOG:  crash recovery starts in timeline 30 and has target timeline 93
2023-05-17 07:06:28.596 UTC [9025] LOG:  redo starts at 4/8A3150D8
2023-05-17 07:06:28.613 UTC [9025] LOG:  invalid record length at 4/8A5D5128: wanted 24, got 0
2023-05-17 07:06:28.613 UTC [9025] LOG:  redo done at 4/8A5D50F8
2023-05-17 07:06:28.654 UTC [9025] PANIC:  could not flush dirty data: Structure needs cleaning

2023-05-17 14:06:36,496 WARNING: Postgresql is not running.
2023-05-17 14:06:36,496 INFO: Lock owner: patroni-IP_3; I am patroni-IP_3
2023-05-17 14:06:36,512 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7145069559047319590
  Database cluster state: shutting down
  pg_control last modified: Wed May 17 14:06:28 2023
  Latest checkpoint location: 4/8A4117B8
  Latest checkpoint's REDO location: 4/8A3150D8
  Latest checkpoint's REDO WAL file: 0000001E000000040000008A
  Latest checkpoint's TimeLineID: 30
  Latest checkpoint's PrevTimeLineID: 30
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:9182574
  Latest checkpoint's NextOID: 216022
  Latest checkpoint's NextMultiXactId: 375408
  Latest checkpoint's NextMultiOffset: 1532251
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 9182574
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Wed May 17 13:36:33 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 4/8A5D5128
  Min recovery ending loc's timeline: 93
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 400
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: f297655fad1846ebdda6a1ce18a2589b32c8d612deb1339e34d5a18480258962

PostgreSQL log files

2023-05-17 06:42:10.121 UTC [6543] LOG:  starting PostgreSQL 13.6 (Debian 13.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-05-17 06:42:10.121 UTC [6543] LOG:  listening on IPv4 address "IP_3", port 8432
2023-05-17 06:42:10.134 UTC [6543] LOG:  listening on Unix socket "/home/vt_app/postgres-active-passive/postgres-data/unix/.s.PGSQL.8432"
2023-05-17 06:42:10.153 UTC [6546] LOG:  database system shutdown was interrupted; last known up at 2023-05-17 06:42:01 UTC
2023-05-17 06:42:11.118 UTC [6548] FATAL:  the database system is starting up
2023-05-17 06:42:11.171 UTC [6550] FATAL:  the database system is starting up
2023-05-17 06:42:11.624 UTC [6546] WARNING:  specified neither primary_conninfo nor restore_command
2023-05-17 06:42:11.624 UTC [6546] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-05-17 06:42:11.624 UTC [6546] LOG:  entering standby mode
2023-05-17 06:42:11.630 UTC [6546] LOG:  database system was not properly shut down; automatic recovery in progress
2023-05-17 06:42:11.639 UTC [6546] LOG:  redo starts at 4/8A3150D8
2023-05-17 06:42:11.652 UTC [6546] LOG:  invalid record length at 4/8A5D4558: wanted 24, got 0
2023-05-17 06:42:11.658 UTC [6546] LOG:  consistent recovery state reached at 4/8A5D4558
2023-05-17 06:42:11.659 UTC [6543] LOG:  database system is ready to accept read only connections
2023-05-17 06:42:12.290 UTC [6546] LOG:  received promote request
2023-05-17 06:42:12.290 UTC [6546] LOG:  redo done at 4/8A5D4530
2023-05-17 06:42:12.290 UTC [6546] LOG:  last completed transaction was at log time 2023-05-17 06:41:47.933268+00
2023-05-17 06:42:12.311 UTC [6546] LOG:  selected new timeline ID: 31
2023-05-17 06:42:12.391 UTC [6546] LOG:  archive recovery complete
2023-05-17 06:42:12.458 UTC [6543] LOG:  database system is ready to accept connections
2023-05-17 06:42:13.366 UTC [6551] PANIC:  could not flush dirty data: Structure needs cleaning
2023-05-17 06:42:13.569 UTC [6543] LOG:  checkpointer process (PID 6551) was terminated by signal 6: Aborted
2023-05-17 06:42:13.569 UTC [6543] LOG:  terminating any other active server processes
2023-05-17 06:42:13.569 UTC [6569] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:13.569 UTC [6569] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:13.569 UTC [6569] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:13.569 UTC [6564] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:13.569 UTC [6564] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:13.569 UTC [6564] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:13.569 UTC [6556] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:13.569 UTC [6556] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:13.569 UTC [6556] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:13.569 UTC [6562] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:13.569 UTC [6562] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:13.569 UTC [6562] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:13.571 UTC [6565] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:13.571 UTC [6565] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:13.571 UTC [6565] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:13.573 UTC [6543] LOG:  all server processes terminated; reinitializing


2023-05-17 06:42:33.675 UTC [6580] LOG:  starting PostgreSQL 13.6 (Debian 13.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-05-17 06:42:33.675 UTC [6580] LOG:  listening on IPv4 address "IP_3", port 8432
2023-05-17 06:42:33.688 UTC [6580] LOG:  listening on Unix socket "/home/vt_app/postgres-active-passive/postgres-data/unix/.s.PGSQL.8432"
2023-05-17 06:42:33.707 UTC [6583] LOG:  database system shutdown was interrupted; last known up at 2023-05-17 06:42:24 UTC
2023-05-17 06:42:34.672 UTC [6585] FATAL:  the database system is starting up
2023-05-17 06:42:34.722 UTC [6587] FATAL:  the database system is starting up
2023-05-17 06:42:35.776 UTC [6589] FATAL:  the database system is starting up
2023-05-17 06:42:36.028 UTC [6583] WARNING:  specified neither primary_conninfo nor restore_command
2023-05-17 06:42:36.028 UTC [6583] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-05-17 06:42:36.028 UTC [6583] LOG:  entering standby mode
2023-05-17 06:42:36.044 UTC [6583] LOG:  redo starts at 4/8A3150D8
2023-05-17 06:42:36.059 UTC [6583] LOG:  consistent recovery state reached at 4/8A5D4588
2023-05-17 06:42:36.059 UTC [6583] LOG:  invalid record length at 4/8A5D4588: wanted 24, got 0
2023-05-17 06:42:36.060 UTC [6580] LOG:  database system is ready to accept read only connections
2023-05-17 06:42:36.902 UTC [6583] LOG:  received promote request
2023-05-17 06:42:36.902 UTC [6583] LOG:  redo done at 4/8A5D4558
2023-05-17 06:42:36.902 UTC [6583] LOG:  last completed transaction was at log time 2023-05-17 06:41:47.933268+00
2023-05-17 06:42:36.921 UTC [6583] LOG:  selected new timeline ID: 32
2023-05-17 06:42:36.997 UTC [6583] LOG:  archive recovery complete
2023-05-17 06:42:37.072 UTC [6580] LOG:  database system is ready to accept connections
2023-05-17 06:42:37.976 UTC [6590] PANIC:  could not flush dirty data: Structure needs cleaning
2023-05-17 06:42:38.180 UTC [6580] LOG:  checkpointer process (PID 6590) was terminated by signal 6: Aborted
2023-05-17 06:42:38.180 UTC [6580] LOG:  terminating any other active server processes
2023-05-17 06:42:38.180 UTC [6608] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:38.180 UTC [6608] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:38.180 UTC [6608] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:38.180 UTC [6606] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:38.180 UTC [6606] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:38.180 UTC [6606] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:38.180 UTC [6601] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:38.180 UTC [6601] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:38.180 UTC [6601] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:38.181 UTC [6607] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:38.181 UTC [6607] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:38.181 UTC [6607] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:38.182 UTC [6595] WARNING:  terminating connection because of crash of another server process
2023-05-17 06:42:38.182 UTC [6595] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-05-17 06:42:38.182 UTC [6595] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-05-17 06:42:38.185 UTC [6580] LOG:  all server processes terminated; reinitializing
2023-05-17 06:42:38.322 UTC [6609] LOG:  database system was interrupted; last known up at 2023-05-17 06:42:37 UTC
2023-05-17 06:42:38.322 UTC [6610] FATAL:  the database system is in recovery mode
2023-05-17 06:42:38.382 UTC [6611] FATAL:  the database system is in recovery mode
2023-05-17 06:42:38.581 UTC [6609] LOG:  database system was not properly shut down; automatic recovery in progress
2023-05-17 06:42:38.581 UTC [6609] LOG:  crash recovery starts in timeline 30 and has target timeline 32
2023-05-17 06:42:38.590 UTC [6609] LOG:  redo starts at 4/8A3150D8
2023-05-17 06:42:38.604 UTC [6609] LOG:  invalid record length at 4/8A5D45B8: wanted 24, got 0
2023-05-17 06:42:38.604 UTC [6609] LOG:  redo done at 4/8A5D4588
2023-05-17 06:42:38.664 UTC [6609] PANIC:  could not flush dirty data: Structure needs cleaning
2023-05-17 06:42:38.871 UTC [6580] LOG:  startup process (PID 6609) was terminated by signal 6: Aborted
2023-05-17 06:42:38.872 UTC [6580] LOG:  aborting startup due to startup process failure

Have you tried to use GitHub issue search?

Anything else we need to know?

No response

About this issue

Original URL
State: closed
Created a year ago
Comments: 17

Commits related to this issue

Start postgres not in recovery in some cases If we know for sure that a few moments ago postgres was still running as a primary and we still have the leader lock and can successfully update it, in th... — committed to zalando/patroni by CyberDem0n a year ago
Start postgres not in recovery in some cases (#2726) If we know for sure that a few moments ago postgres was still running as a primary and we still have the leader lock and can successfully update i... — committed to zalando/patroni by CyberDem0n a year ago

Most upvoted comments

Such kind of errors that you have in logs is an indicator of serious problems with your setup - faulty hardware, corrupted data or maybe even both. After failover postgres maybe will not immediately crash, but it will not mean that everything is good. The problem may still manifest itself, but a bit later. It is better if you notice/analyze/fix the problem with your setup earlier, because more you postpone the more unrecoverable data you may get.

CyberDem0n on Jul 3, 2023