patroni: Recovery error when using custom bootstrap and recovery_target_action: pause
Describe the bug In issue 1891 a change was made that breaks patroni behavior when restoring via custom bootstrap.
Expected behavior In postgres-Mon.log
recovery stopping before commit of transaction 1858, time 2023-02-22 12:51:26.046508+03
This is the point in time I need to recover.
I go to the database and see if the necessary data has been restored, if not, stop patroni, change recovery_target_time and start patroni.
And I can repeat this several times, moving the restore point forward without having to restore the entire cluster all over again.
After reaching the desired timestamp, I simply execute the psql command SELECT pg_xlog_replay_resume()
and postgres exits recovery mode and becomes the primary node.
If I rename data.failed -> data
and start the patroni service, then postgres starts up, but immediately does promote, which I don’t need, because Otherwise, I can not do the recovery procedure described above.
To Reproduce Environment
- Patroni version: 2.0.2+
- PostgreSQL version: 9.6+
- DCS (and its version): etcd 3.5.1
- postgres backup util: wal-g 2.0.1
Patroni configuration file
bootstrap:
method: wal-g
wal-g:
command: /usr/local/bin/wal_g_bootstrap.sh
keep_existing_recovery_conf: False
recovery_conf:
recovery_target_time: '2023-02-22 12:51:26'
recovery_target_action: pause
recovery_target_timeline: 4
restore_command: '/usr/local/bin/wal-g wal-fetch %f %p'
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 30
synchronous_mode: false
synchronous_mode_strict: false
postgresql:
use_pg_rewind: true
parameters:
archive_mode: 'on'
archive_command: '/usr/local/bin/wal-g wal-push %p'
bootstrap script wal_g_bootstrap.sh
#!/usr/bin/env bash
while getopts ":-:" optchar; do
[[ "${optchar}" == "-" ]] || continue
case "${OPTARG}" in
datadir=* )
DATA_DIR=${OPTARG#*=}
;;
scope=* )
SCOPE=${OPTARG#*=}
;;
esac
done
/usr/local/bin/wal-g backup-fetch $DATA_DIR base_000000040000000000000017
patroni.log
2023-02-27 09:59:34,412 INFO: Selected new etcd server http://127.0.0.1:2379
2023-02-27 09:59:34,430 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-02-27 09:59:34,484 INFO: Lock owner: None; I am cluster_name
2023-02-27 09:59:34,593 INFO: trying to bootstrap a new cluster
2023-02-27 09:59:34,593 INFO: Running custom bootstrap script: /usr/local/bin/wal_g_bootstrap.sh
2023-02-27 09:59:44,434 INFO: Lock owner: None; I am cluster_name
2023-02-27 09:59:44,434 INFO: not healthy enough for leader race
2023-02-27 09:59:44,735 INFO: bootstrap in progress
2023-02-27 09:59:51,433 INFO: postmaster pid=29043
2023-02-27 09:59:54,434 INFO: Lock owner: None; I am cluster_name
2023-02-27 09:59:54,434 INFO: not healthy enough for leader race
2023-02-27 09:59:54,714 INFO: bootstrap in progress
2023-02-27 09:59:56,569 INFO: removing initialize key after failed attempt to bootstrap the cluster
2023-02-27 09:59:56,581 INFO: renaming data directory to /var/lib/pgsql/9.6/data.failed
syslog
INFO: 2023/02/27 09:59:34.732530 Selecting the backup with name base_000000040000000000000017...
INFO: 2023/02/27 09:59:47.686658 Finished extraction of part_002.tar.br
INFO: 2023/02/27 09:59:47.829682 Finished extraction of part_004.tar.br
INFO: 2023/02/27 09:59:50.840466 Finished extraction of part_001.tar.br
INFO: 2023/02/27 09:59:50.985037 Finished extraction of pg_control.tar.br
INFO: 2023/02/27 09:59:50.985111
Backup extraction complete.
2023-02-27 09:59:51 MSK [29043]: [1-1] 63fc54e7.7173 0 LOG: redirecting log output to logging collector process
2023-02-27 09:59:51 MSK [29043]: [2-1] 63fc54e7.7173 0 HINT: Future log output will appear in directory "/var/lib/pgsql/9.6/logs".
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - no response
Traceback (most recent call last):
File "/usr/bin/patroni", line 11, in <module>
load_entry_point('patroni==3.0.0', 'console_scripts', 'patroni')()
File "/usr/lib/python3.6/site-packages/patroni/__main__.py", line 144, in main
return patroni_main()
File "/usr/lib/python3.6/site-packages/patroni/__main__.py", line 136, in patroni_main
abstract_main(Patroni, schema)
File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 108, in abstract_main
controller.run()
File "/usr/lib/python3.6/site-packages/patroni/__main__.py", line 106, in run
super(Patroni, self).run()
File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 65, in run
self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/__main__.py", line 109, in _run_cycle
logger.info(self.ha.run_cycle())
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1771, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1593, in _run_cycle
return self.post_bootstrap()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1484, in post_bootstrap
self.cancel_initialization()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1477, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
postgresql-Mon.log
2023-02-27 09:59:51 MSK [29046]: [1-1] 63fc54e7.7176 0 LOG: database system was interrupted; last known up at 2023-02-22 12:33:23 MSK
2023-02-27 09:59:51 MSK [29046]: [2-1] 63fc54e7.7176 0 LOG: creating missing WAL directory "pg_xlog/archive_status"
2023-02-27 09:59:51 MSK [29047]: [1-1] 63fc54e7.7177 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45240
2023-02-27 09:59:51 MSK [29047]: [2-1] 63fc54e7.7177 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is starting up
2023-02-27 09:59:51 MSK [29049]: [1-1] 63fc54e7.7179 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45242
2023-02-27 09:59:51 MSK [29049]: [2-1] 63fc54e7.7179 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is starting up
2023-02-27 09:59:51 MSK [29046]: [3-1] 63fc54e7.7176 0 LOG: restored log file "00000004.history" from archive
2023-02-27 09:59:51 MSK [29046]: [4-1] 63fc54e7.7176 0 LOG: starting point-in-time recovery to 2023-02-22 12:51:26+03
2023-02-27 09:59:51 MSK [29046]: [5-1] 63fc54e7.7176 0 LOG: restored log file "00000004.history" from archive
2023-02-27 09:59:52 MSK [29046]: [6-1] 63fc54e7.7176 0 LOG: restored log file "000000040000000000000017" from archive
INFO: 2023/02/27 09:59:52.101640 WAL-prefetch file: 000000040000000000000018
INFO: 2023/02/27 09:59:52.112315 WAL-prefetch file: 000000040000000000000019
2023-02-27 09:59:52 MSK [29046]: [7-1] 63fc54e7.7176 0 LOG: redo starts at 0/17000028
2023-02-27 09:59:52 MSK [29046]: [8-1] 63fc54e7.7176 0 LOG: consistent recovery state reached at 0/170000F8
2023-02-27 09:59:52 MSK [29046]: [9-1] 63fc54e7.7176 0 LOG: restored log file "000000040000000000000018" from archive
INFO: 2023/02/27 09:59:52.434774 WAL-prefetch file: 00000004000000000000001A
2023-02-27 09:59:52 MSK [29105]: [1-1] 63fc54e8.71b1 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45260
2023-02-27 09:59:52 MSK [29105]: [2-1] 63fc54e8.71b1 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is starting up
2023-02-27 09:59:52 MSK [29046]: [10-1] 63fc54e7.7176 0 LOG: restored log file "000000040000000000000019" from archive
INFO: 2023/02/27 09:59:52.854834 WAL-prefetch file: 00000004000000000000001B
2023-02-27 09:59:53 MSK [29046]: [11-1] 63fc54e7.7176 0 LOG: restored log file "00000004000000000000001A" from archive
INFO: 2023/02/27 09:59:53.169198 WAL-prefetch file: 00000004000000000000001C
2023-02-27 09:59:53 MSK [29142]: [1-1] 63fc54e9.71d6 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45268
2023-02-27 09:59:53 MSK [29142]: [2-1] 63fc54e9.71d6 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is starting up
2023-02-27 09:59:53 MSK [29046]: [12-1] 63fc54e7.7176 0 LOG: restored log file "00000004000000000000001B" from archive
INFO: 2023/02/27 09:59:53.722749 WAL-prefetch file: 00000004000000000000001D
2023-02-27 09:59:54 MSK [29161]: [1-1] 63fc54ea.71e9 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45274
2023-02-27 09:59:54 MSK [29161]: [2-1] 63fc54ea.71e9 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is starting up
2023-02-27 09:59:54 MSK [29046]: [13-1] 63fc54e7.7176 0 LOG: recovery stopping before commit of transaction 1858, time 2023-02-22 12:51:26.046508+03
2023-02-27 09:59:54 MSK [29043]: [3-1] 63fc54e7.7173 0 LOG: shutdown at recovery target
2023-02-27 09:59:54 MSK [29085]: [1-1] 63fc54e8.719d 0 LOG: shutting down
2023-02-27 09:59:54 MSK [29085]: [2-1] 63fc54e8.719d 0 LOG: restartpoint starting: shutdown immediate
2023-02-27 09:59:55 MSK [29163]: [1-1] 63fc54eb.71eb 0 [unknown] [unknown] [unknown] 127.0.0.1 LOG: connection received: host=127.0.0.1 port=45278
2023-02-27 09:59:55 MSK [29163]: [2-1] 63fc54eb.71eb 0 postgres postgres [unknown] 127.0.0.1 FATAL: the database system is shutting down
2023-02-27 09:59:56 MSK [29085]: [31] 63fc54e8.719d 0 LOG: restartpoint complete: wrote 9622 buffers (7.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.158 s, sync=0.972 s, total=1.165 s; sync files=37, longest=0.970 s, average=0.026 s; distance=74631 kB, estimate=74631 kB
2023-02-27 09:59:56 MSK [29085]: [4-1] 63fc54e8.719d 0 LOG: recovery restart point at 0/1B8E1D18
2023-02-27 09:59:56 MSK [29085]: [5-1] 63fc54e8.719d 0 DETAIL: last completed transaction was at log time 2023-02-22 12:33:41.56509+03
2023-02-27 09:59:56 MSK [29043]: [4-1] 63fc54e7.7173 0 LOG: database system is shut down
After my changes everything works as expected.
diff --git a/patroni/postgresql/config.py b/patroni/postgresql/config.py
index 27fd4cf..a93ff8e 100644
--- a/patroni/postgresql/config.py
+++ b/patroni/postgresql/config.py
@@ -1098,7 +1098,7 @@ class ConfigHandler(object):
# like max_connections are increased, therefore we disable hot_standby.
if self._postgresql.bootstrap.running_custom_bootstrap and \
(self._postgresql.bootstrap.keep_existing_recovery_conf or self._recovery_conf):
- effective_configuration['hot_standby'] = 'off'
+ effective_configuration['hot_standby'] = 'on'
self._postgresql.set_pending_restart(True)
return effective_configuration
**syslog**
Backup extraction complete.
localhost:5432 - no response
2023-02-27 10:31:47 MSK [29946]: [1-1] 63fc5c62.74fa 0 LOG: redirecting log output to logging collector process
2023-02-27 10:31:47 MSK [29946]: [2-1] 63fc5c62.74fa 0 HINT: Future log output will appear in directory "/var/lib/pgsql/9.6/logs".
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - accepting connections
postgresql-Mon.log
2023-02-27 10:31:50 MSK [29949]: [13-1] 63fc5c63.74fd 0 LOG: recovery stopping before commit of transaction 1858, time 2023-02-22 12:51:26.046508+03
2023-02-27 10:31:50 MSK [29949]: [14-1] 63fc5c63.74fd 0 LOG: recovery has paused
2023-02-27 10:31:50 MSK [29949]: [15-1] 63fc5c63.74fd 0 HINT: Execute pg_xlog_replay_resume() to continue.
After bash psql -c 'SELECT pg_xlog_replay_resume()'
patroni.log
2023-02-27 10:34:08,983 INFO: waiting for end of recovery after bootstrap
2023-02-27 10:34:19,084 INFO: running post_bootstrap
2023-02-27 10:34:19,891 INFO: closed patroni connection to the postgresql cluster
2023-02-27 10:34:20,315 INFO: postmaster pid=30290
2023-02-27 10:34:21,415 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2023-02-27 10:34:21,415 INFO: establishing a new patroni connection to the postgres cluster
2023-02-27 10:34:21,665 INFO: initialized a new cluster
2023-02-27 10:34:21,824 INFO: no action. I am (cluster_name), the leader with the lock
syslog
2023-02-27 10:34:20 MSK [30290]: [1-1] 63fc5cfc.7652 0 LOG: redirecting log output to logging collector process
2023-02-27 10:34:20 MSK [30290]: [2-1] 63fc5cfc.7652 0 HINT: Future log output will appear in directory "/var/lib/pgsql/9.6/logs".
localhost:5432 - rejecting connections
localhost:5432 - rejecting connections
localhost:5432 - accepting connections
postgresql-Mon.log
2023-02-27 10:34:09 MSK [29949]: [16-1] 63fc5c63.74fd 0 LOG: redo done at 0/1B8E3748
2023-02-27 10:34:09 MSK [29949]: [17-1] 63fc5c63.74fd 0 LOG: last completed transaction was at log time 2023-02-22 12:33:41.56509+03
2023-02-27 10:34:10 MSK [29949]: [18-1] 63fc5c63.74fd 0 LOG: restored log file "00000005.history" from archive
ERROR: 2023/02/27 10:34:10.334171 Archive '00000006.history' does not exist.
2023-02-27 10:34:10 MSK [29949]: [19-1] 63fc5c63.74fd 0 LOG: selected new timeline ID: 6
2023-02-27 10:34:10 MSK [29949]: [20-1] 63fc5c63.74fd 0 LOG: restored log file "00000004.history" from archive
2023-02-27 10:34:10 MSK [29949]: [21-1] 63fc5c63.74fd 0 LOG: archive recovery complete
2023-02-27 10:34:10 MSK [29989]: [1-1] 63fc5c64.7525 0 LOG: checkpoint starting: end-of-recovery immediate wait
2023-02-27 10:34:11 MSK [29989]: [2-1] 63fc5c64.7525 0 LOG: checkpoint complete: wrote 9622 buffers (7.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.201 s, sync=0.419 s, total=0.676 s; sync files=37, longest=0
.416 s, average=0.011 s; distance=74637 kB, estimate=74637 kB
2023-02-27 10:34:11 MSK [29949]: [22-1] 63fc5c63.74fd 0 LOG: MultiXact member wraparound protections are now enabled
2023-02-27 10:34:11 MSK [30280]: [1-1] 63fc5cf3.7648 0 LOG: autovacuum launcher started
2023-02-27 10:34:11 MSK [29946]: [4-1] 63fc5c62.74fa 0 LOG: database system is ready to accept connections
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 3
- Comments: 16
Commits related to this issue
- Set hot_standby=off only if recovery_target_action=promote During custom bootstrap the hot_standby is set to off to protect postgres from panicking and shutting down when some parameters like max_con... — committed to zalando/patroni by CyberDem0n a year ago
- Set hot_standby=off only if recovery_target_action=promote (#2570) During custom bootstrap the `hot_standby` is set to off to protect postgres from panicking and shutting down when some parameters li... — committed to zalando/patroni by CyberDem0n a year ago
What could we do - set
hot_standby
tooff
only ifrecovery_target_action = 'promote'
And as I already explained, hot_standby is set to off for a good reason