patroni: A cluster with a single "nofailover" node left will never recover

Describe the bug If node A tagged nofailover is the only node left in a cluster, and then you bring back node B, node B will never become a leader until you failover the cluster. But even you failover it manually, it remains broken.

To Reproduce

  1. Create a cluster of 4 nodes: vas2, vas3, vas4, vas5
  2. Tag 4 with nofailover
  3. Poweroff nodes 2, 3, 5
  4. Watch 4 remaining a single Replica with no Leader in cluster (expected) Screenshot_20220218_095653
  5. Restore power to 3 and see that it never becomes Leader (unexpected) Screenshot_20220218_100029
  6. Run failover manually to recover and watch that 3 has become Leader but timelines have diverged: Screenshot_20220218_102553

This is probably a broken cluster.

Expected behavior I expect 3 to become the new Leader immediately after poweron and 4 to start following it.

Environment

  • Patroni version: 2.1.2-2.pgdg100+1
  • PostgreSQL version: 13.6
  • DCS (and its version): Consul v1.11.3
  • OS: Debian GNU/Linux 10 (buster)

patronictl show-config

loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    max_connections: 960
    max_locks_per_transaction: 256
    max_prepared_transactions: 960
    max_replication_slots: 10
    max_wal_senders: 10
    max_worker_processes: 16
    track_commit_timestamp: 'on'
    wal_keep_size: 16GB
  pg_hba:
  - local   all             all                               trust
  - host    all             all       0.0.0.0/0               md5
  - local   replication   repmgr                              trust
  - host    replication   repmgr      0.0.0.0/0               md5
  remove_data_directory_on_diverged_timelines: false
  remove_data_directory_on_rewind_failure: true
  use_pg_rewind: false
retry_timeout: 10
synchronous_mode: true
synchronous_node_count: 3
ttl: 30

Have you checked Patroni logs?

The unfortunate nofailover node writes:

INFO: no action. I am (vas4), a secondary, and following a leader (vas5)

(vas5 is not even there, it was powered off).

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 31 (17 by maintainers)

Most upvoted comments

No, not everything, just the former leader is lost.

Please read the documentation: https://patroni.readthedocs.io/en/latest/replication_modes.html#synchronous-mode-implementation

A node that is not the leader or current synchronous standby is not allowed to promote itself automatically.

You deliberately marked one node with the nofailover, but didn’t tag it with nosync, therefore you get to the situation that this node is synchronous, but not allowed to be automatically promoted. That’s not a Patroni fault, it acts according to spec.

If you want to recover from this situation the right thing to do would be removing the nofailover tag (temporary) from it. After that this node would be promoted, and you can set the tag back.