patroni: Patronictl List not displaying a Node

Hi Team,

Below are the versions :

Patroni : 1.6.5
PostgreSQL : 11.8
OS : Ubuntu 18.04

I have come across a strange issue and need some help regarding the same. One of my nodes is not showing in the patronictl list even though the node seems to be active and also data is getting replicated properly. The respective node “pg3” also doesn’t show in the members list when I check in the etcd server list. Only when I restart the service that the issue gets resolved.

postgres@q-sw-pgdb-r05:/var/log/postgresql$ patronictl -c /etc/patroni/deepthought1.yml list
+ Cluster: deepthought (6828106004191544221) ----+-----------+
| Member |      Host     |  Role  |  State  | TL | Lag in MB |
+--------+---------------+--------+---------+----+-----------+
|  pg1   | 10.47.226.202 | Leader | running |  4 |           |
|  pg2   | 10.47.226.203 |        | running |  4 |         0 |
+--------+---------------+--------+---------+----+-----------+
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 115230
usesysid         | 16384
usename          | replicator
application_name | pg2
client_addr      | 10.47.226.203
client_hostname  |
client_port      | 50754
backend_start    | 2020-05-25 23:21:03.287489-07
backend_xmin     |
state            | streaming
sent_lsn         | 7/DA04ABF8
write_lsn        | 7/DA04ABF8
flush_lsn        | 7/DA04ABF8
replay_lsn       | 7/DA04ABF8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+------------------------------
pid              | 27783
usesysid         | 16384
usename          | replicator
application_name | pg3
client_addr      | 10.47.226.80
client_hostname  |
client_port      | 41776
backend_start    | 2020-05-28 03:17:51.67092-07
backend_xmin     |
state            | streaming
sent_lsn         | 7/DA04ABF8
write_lsn        | 7/DA04ABF8
flush_lsn        | 7/DA04ABF8
replay_lsn       | 7/DA04ABF8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

However, after a few hours again the same issue crops up and I am struggling to reproduce the issue from Postgresql & Patroni logs.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (1 by maintainers)

Most upvoted comments

Looks like Patroni on pg3 is not running, you have to figure out why. And pg1 should periodically write into logs Failed to drop replication slot 'pg3'

which situation will cause it periodically write into logs Failed to drop replication slot i notice if _schedule_load_slots = False, if the slots is not active , it will not call drop_replicaiton_slots, if it has called drop_replicaiton_slots but failed due to the slots is still active, it will try to do it again and again. in my environment, i have a 3 node patroni cluster, and i also have some other postgres (not belong the patroni cluster) use logical replication with the patroni leader node, but i find the patroni leader node try to drop the logical replication slots, and report Failed to drop replication slot the patroni version is v1.6.0 and postgres version is 10.