postgres-operator: password authentication failed for user \"postgres\""

Hi,

I have stuck in the DB connection problem for a while. I have noticed there has a few issues with common symptom like my case, but none of their solutions works in my case. Can someone please give me some directions?

I have successfully installed the postgres-operator thru helm, created my cluster and run my cluster successfully. But I cannot connect to the DB even within Kubernetes cluster, so I ssh to the DB pod and connect the DB by kubectl exec -it pod/test-testdb-cluster-0 -- bash and psql -U postgres. However, I don’t see the role and DB I specified in my yaml(testdbname and testdbowner).

									List of roles
 Role name  |                         Attributes                         | Member of 
------------+------------------------------------------------------------+-----------
 admin      | Create DB, Cannot login                                    | {}
 postgres   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 robot_zmon | Cannot login                                               | {}
 standby    | Replication                                                | {}
 zalandos   | Create DB, Cannot login                                    | {}


					  List of relations
 Schema |          Name           |     Type      |  Owner   
--------+-------------------------+---------------+----------
 public | failed_authentication_0 | view          | postgres
 public | failed_authentication_1 | view          | postgres
 public | failed_authentication_2 | view          | postgres
 public | failed_authentication_3 | view          | postgres
 public | failed_authentication_4 | view          | postgres
 public | failed_authentication_5 | view          | postgres
 public | failed_authentication_6 | view          | postgres
 public | failed_authentication_7 | view          | postgres
 public | pg_auth_mon             | view          | postgres
 public | pg_stat_kcache          | view          | postgres
 public | pg_stat_kcache_detail   | view          | postgres
 public | pg_stat_statements      | view          | postgres
 public | postgres_log            | table         | postgres
 public | postgres_log_0          | foreign table | postgres
 public | postgres_log_1          | foreign table | postgres
 public | postgres_log_2          | foreign table | postgres
 public | postgres_log_3          | foreign table | postgres
 public | postgres_log_4          | foreign table | postgres
 public | postgres_log_5          | foreign table | postgres
 public | postgres_log_6          | foreign table | postgres
 public | postgres_log_7          | foreign table | postgres
(21 rows)

kubectl get all
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-584d4f8b45-jwczj   1/1     Running   0          5d13h
pod/test-testdb-cluster-0    1/1     Running   0          5d13h

NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)        AGE
service/nginx                        LoadBalancer   172.17.154.178   192.168.1.247   80:30193/TCP   5d13h
service/test-testdb-cluster          ClusterIP      172.16.104.168   <none>          5432/TCP       5d13h
service/test-testdb-cluster-config   ClusterIP      None             <none>          <none>         5d13h
service/test-testdb-cluster-repl     ClusterIP      172.17.204.69    <none>          5432/TCP       5d13h

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           5d13h

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-584d4f8b45   1         1         1       5d13h

NAME                                   READY   AGE
statefulset.apps/test-testdb-cluster   1/1     5d13h

Log from the cluster:

kubectl logs pod/test-testdb-cluster-0
2020-08-17 09:54:30,947 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2020-08-17 09:54:32,951 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2020-08-17 09:54:32,952 - bootstrapping - INFO - No meta-data available for this provider
2020-08-17 09:54:32,952 - bootstrapping - INFO - Looks like your running local
2020-08-17 09:54:32,981 - bootstrapping - INFO - Configuring bootstrap
2020-08-17 09:54:32,981 - bootstrapping - INFO - Configuring crontab
2020-08-17 09:54:32,982 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2020-08-17 09:54:32,982 - bootstrapping - INFO - Configuring pgbouncer
2020-08-17 09:54:32,982 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2020-08-17 09:54:32,982 - bootstrapping - INFO - Configuring wal-e
2020-08-17 09:54:32,982 - bootstrapping - INFO - Configuring pam-oauth2
2020-08-17 09:54:32,982 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2020-08-17 09:54:32,982 - bootstrapping - INFO - Configuring pgqd
2020-08-17 09:54:32,983 - bootstrapping - INFO - Configuring certificate
2020-08-17 09:54:32,983 - bootstrapping - INFO - Generating ssl certificate
2020-08-17 09:54:33,073 - bootstrapping - INFO - Configuring patroni
2020-08-17 09:54:33,081 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2020-08-17 09:54:33,081 - bootstrapping - INFO - Configuring log
2020-08-17 09:54:33,081 - bootstrapping - INFO - Configuring standby-cluster
2020-08-17 09:54:34,367 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-08-17 09:54:34,390 WARNING: Postgresql is not running.
2020-08-17 09:54:34,390 INFO: Lock owner: None; I am test-testdb-cluster-0
2020-08-17 09:54:34,394 INFO: pg_controldata:
  pg_control version number: 1201
  Catalog version number: 201909212
  Database system identifier: 6861875284178866242
  Database cluster state: in production
  pg_control last modified: Mon Aug 17 09:45:25 2020
  Latest checkpoint location: 0/30C2A38
  Latest checkpoint's REDO location: 0/30C2A00
  Latest checkpoint's REDO WAL file: 000000020000000000000003
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:882
  Latest checkpoint's NextOID: 32768
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 480
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 882
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Aug 17 09:45:11 2020
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float4 argument passing: by value
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 70a4642487623f0d3db3448c6cdccb40f243ae5656da8a85aa9f58c7992db5bc

2020-08-17 09:54:34,403 INFO: doing crash recovery in a single user mode
2020-08-17 09:54:37,807 WARNING: Postgresql is not running.
2020-08-17 09:54:37,807 INFO: Lock owner: None; I am test-testdb-cluster-0
2020-08-17 09:54:37,811 INFO: pg_controldata:
  pg_control version number: 1201
  Catalog version number: 201909212
  Database system identifier: 6861875284178866242
  Database cluster state: shut down
  pg_control last modified: Mon Aug 17 09:54:37 2020
  Latest checkpoint location: 0/4000028
  Latest checkpoint's REDO location: 0/4000028
  Latest checkpoint's REDO WAL file: 000000020000000000000004
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:882
  Latest checkpoint's NextOID: 32768
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 480
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Aug 17 09:54:37 2020
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float4 argument passing: by value
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 70a4642487623f0d3db3448c6cdccb40f243ae5656da8a85aa9f58c7992db5bc

2020-08-17 09:54:37,811 INFO: Lock owner: None; I am test-testdb-cluster-0
2020-08-17 09:54:37,823 INFO: Lock owner: None; I am test-testdb-cluster-0
2020-08-17 09:54:37,824 INFO: starting as a secondary
2020-08-17 09:54:38,050 INFO: postmaster pid=61
/var/run/postgresql:5432 - no response
2020-08-17 09:54:38 UTC [61]: [1-1] 5f3a53de.3d 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2020-08-17 09:54:38 UTC [61]: [2-1] 5f3a53de.3d 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2020-08-17 09:54:38 UTC [61]: [3-1] 5f3a53de.3d 0     LOG:  starting PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
2020-08-17 09:54:38 UTC [61]: [4-1] 5f3a53de.3d 0     LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-08-17 09:54:38 UTC [61]: [5-1] 5f3a53de.3d 0     LOG:  listening on IPv6 address "::", port 5432
2020-08-17 09:54:38 UTC [61]: [6-1] 5f3a53de.3d 0     LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-08-17 09:54:38 UTC [61]: [7-1] 5f3a53de.3d 0     LOG:  redirecting log output to logging collector process
2020-08-17 09:54:38 UTC [61]: [8-1] 5f3a53de.3d 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2020-08-17 09:54:39,091 INFO: establishing a new patroni connection to the postgres cluster
2020-08-17 09:54:39,142 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2020-08-17 09:54:39,152 ERROR: Can not fetch local timeline and lsn from replication connection
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/__init__.py", line 685, in get_local_timeline_lsn_from_replication_connection
    with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/__init__.py", line 679, in get_replication_connection_cursor
    with get_connection_cursor(**conn_kwargs) as cur:
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
    with psycopg2.connect(**kwargs) as conn:
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  password authentication failed for user "standby"
FATAL:  no pg_hba.conf entry for replication connection from host "127.0.0.1", user "standby", SSL off

2020-08-17 09:54:39,164 INFO: promoted self to leader by acquiring session lock
server promoting
2020-08-17 09:54:39,169 INFO: cleared rewind state after becoming the leader
2020-08-17 09:54:40,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:54:40,185 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:54:40,247 INFO: no action.  i am the leader with the lock
SET
DO
DO
DO
NOTICE:  extension "pg_auth_mon" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.0" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
NOTICE:  extension "pg_cron" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.2" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
REVOKE
GRANT
GRANT
NOTICE:  extension "file_fdw" already exists, skipping
CREATE EXTENSION
DO
NOTICE:  relation "postgres_log" already exists, skipping
CREATE TABLE
GRANT
NOTICE:  relation "postgres_log_0" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_1" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_2" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_3" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_4" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_5" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_6" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_7" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE:  drop cascades to 3 other objects
DETAIL:  drop cascades to type zmon_utils.system_information
drop cascades to function zmon_utils.get_database_cluster_information()
drop cascades to function zmon_utils.get_database_cluster_system_information()
DROP SCHEMA
NOTICE:  extension "plpython3u" already exists, skipping
DO
NOTICE:  language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE:  function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
GRANT
You are now connected to database "postgres" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "template1" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2020-08-17 09:54:50,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:54:50,234 INFO: no action.  i am the leader with the lock
2020-08-17 09:55:00,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:55:00,233 INFO: no action.  i am the leader with the lock
2020-08-17 09:55:10,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:55:10,192 INFO: no action.  i am the leader with the lock
2020-08-17 09:55:20,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:55:20,184 INFO: no action.  i am the leader with the lock
2020-08-17 09:55:30,179 INFO: Lock owner: test-testdb-cluster-0; I am test-testdb-cluster-0
2020-08-17 09:55:30,226 INFO: no action.  i am the leader with the lock

My yaml:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: test-testdb-cluster
  namespace: test
spec:
  teamId: "test"
  volume:
    size: 1Gi
  numberOfInstances: 1
  enableMasterLoadBalancer: false
  enableReplicaLoadBalancer: false
  users:
    # database owner
    testdbowner:
    - superuser
    - createdb
    testdb2owner:
    - createdb

    # role for application foo
    # foo_user: # or 'foo_user: []'

  #databases: name->owner
  databases:
    testdbname: testdbowner
    testdb2name: testdb2owner
  preparedDatabases:
    testdbname:
      extensions:
        postgis: data
  postgresql:
    version: "12"

Helm3 installation cmd:

helm install postgres --namespace postgres charts/postgres-operator --set configKubernetes.cluster_domain=MY_DOMAIN --set configLoadBalancer.db_hosted_zone=db.MY_DOMAIN  -f ./charts/postgres-operator/values-crd.yaml

I have tried the 1.4.0, 1.5.0 and master branch/tag. I have also tried to install helm without values-crd.yaml, but none of them works. Any ideas?

Thanks a lot.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 16 (3 by maintainers)

Most upvoted comments

Ran in to this issue as well. Recreated the postgresql cluster with an existing volume. The new cluster started up but the postgres operator is failing with the following error log: pq: password authentication failed for user \"postgres\""

Recovery procedure

get the new postgres user password:

export NAME=my-db
export NAMESPACE=my-ns

kubectl get secrets -o yaml postgres.${NAME}.credentials.postgresql.acid.zalan.do -n ${NAMESPACE} \
  | grep password \
  | head -n 1 \
  | awk '{ print $2 }' \
  | base64 --decode

exec into the database pod and change the password manually

kubectl exec -it ${NAME}-0 -- /bin/bash

$ psql postgres postgres
$ \password
// type  in the password from step 1

restart the postgres-operator

kubectl delete pod -n postgres-operator postgres-operator-xxx-yyy
kubectl logs -n postgres-operator svc/postgres-operator -f

+11

Starefossen on Feb 25, 2021

Any updates on the issue not requiring manual intervention?

anthr76 on May 1, 2021

Well, we have the same problem and I probably figured out why - at least for our case.

We use longhorn for storage provisioning and want to keep the data whatever it takes and reuse Volumes, so we can shut down the whole cluster and set it up again. The users get created for the first time in our database and provide a secret postgres.project-postgres.credentials.postgresql.acid.zalan.do which has the correct password to the entry in pg_shadow. When we restart the whole cluster / namespace, the “old” password of postgres (and of course for standby and all other roles) is still persistent due to the volume, but a new password will be generated and provided as new secret. What’s missing here (I think) is that the initdb should alter the user password when the user already exists. Does this makes sense? https://github.com/zalando/postgres-operator/blob/692c721854e4667102778e8e69e3dd12e47984b5/pkg/util/users/users.go#L51 should handle this (should also be called with EventSync at cluster startup)

Can you give me an advice @Jan-M

kkendzia on Feb 8, 2021

or the other way around:

exec into database pod
env | grep PASS
echo -n "grepped-password" | base64
update secret with base64 encoded password

FxKu on Feb 26, 2021