postgres-operator: Permission issue with tls cert after upgrading to 1.5

After upgrading 1.4 -> 1.5 my cluster couldn’t init.

I’ve checked that certs are mounted into container, and I’m able to read them as a root user. Not sure where to look next.

Logs from cluster pods:

...
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/12/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start

2020-05-20 14:40:25 UTC [301]: [1-1] 5ec54159.12d 0     FATAL:  could not load server certificate file "/tls/tls.crt": Permission denied
2020-05-20 14:40:25 UTC [301]: [2-1] 5ec54159.12d 0     LOG:  database system is shut down
2020-05-20 14:40:25,108 INFO: postmaster pid=301
/var/run/postgresql:5432 - no response
2020-05-20 14:40:25,121 INFO: removing initialize key after failed attempt to bootstrap the cluster
2020-05-20 14:40:25,137 INFO: renaming data directory to /home/postgres/pgdata/pgroot/data_2020-05-20-14-40-25
2020-05-20 14:40:25,587 INFO: Lock owner: None; I am grafana-cluster-0
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    load_entry_point('patroni==1.6.5', 'console_scripts', 'patroni')()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 235, in main
    return patroni_main()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 199, in patroni_main
    patroni.run()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 135, in run
    logger.info(self.ha.run_cycle())
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1370, in run_cycle
    info = self._run_cycle()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1277, in _run_cycle
    return self.post_bootstrap()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1173, in post_bootstrap
    self.cancel_initialization()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1168, in cancel_initialization
    raise PatroniException('Failed to bootstrap cluster')
patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'
/run/service/patroni: finished with code=1 signal=0
/run/service/patroni: exceeded maximum number of restarts 5
stopping /run/service/patroni
timeout: finish: .: (pid 303) 10s, want down

Permissions in container:

root@grafana-cluster-0:/home/postgres# ls -la /tls
total 4
drwxrwxrwt 3 root root  120 May 20 15:16 .
drwxr-xr-x 1 root root 4096 May 20 15:17 ..
drwxr-xr-x 2 root root   80 May 20 15:16 ..2020_05_20_15_16_36.928412376
lrwxrwxrwx 1 root root   31 May 20 15:16 ..data -> ..2020_05_20_15_16_36.928412376
lrwxrwxrwx 1 root root   14 May 20 15:16 tls.crt -> ..data/tls.crt
lrwxrwxrwx 1 root root   14 May 20 15:16 tls.key -> ..data/tls.key

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 23 (6 by maintainers)

Most upvoted comments

and magically things work!

this took a lot of running around.

i think the documentation could be increased in this area. https://postgres-operator.readthedocs.io/en/latest/user/#custom-tls-certificates is a start but

if applications are going to be fussy because of forced tls using an invalid certificate, there should be more setup instructions to facilitate good communication between clients and the new database.

This is happening for me even for fresh installation- on 1.5.0

In regards to toggling securityContext (containing the FS Group) not triggering a rolling update - that sounds like a bug to be resolved.

  • honor changes in securityContext and propagate on changes