postgresql-container: Redeployment unable to startup again

Updated the resource limits for a postgresql-persistent 9.5 deployment

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".
 done
server started
ERROR:  tuple already updated by self

It seems the first pod did not shutdown cleanly and left the PID in /var/lib/pgsql/data/userdata/postmaster.pid volume thus preventing the container from starting up automatically without manual intervention

Perhaps an edge case as this is the first time seeing this with many other postgresql deployments

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 12
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Hi, facing the same issue, using the Recreate strategy. Deleting the postmaster.pid also did not help, as I got the same error at the next pod startup. Any idea on how to fix or work around this?

Hi @martin123218

The problem with the Rolling strategy is that it tells Openshift to first create a new pod with the same data volume as the old one and only shut down the original pod when the new pod is up and running. Since there are at one time two pods accessing (and presumably writing to) the same data volume you can run into this issue.

Please use the Recreate strategy instead. There will be some downtime since the new pod is only started after the old pod gets shut down but you should not run into this issue anymore.

Hello, here facing the same problem: container unable to start, with same output detailed above. [Is there / does anyone have found] any solution or workaround for this problem? I quickly tried removing /var/lib/pgsql/data/userdata/postmaster.pid file but when starting the container I’m getting the same issue.

EDIT: I double checked, and in my case the output is:

 pg_ctl: another server might be running; trying to start server anyway
 waiting for server to start....LOG:  redirecting log output to logging collector process
 HINT:  Future log output will appear in directory "pg_log".
 ... done
 server started
 => sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
 ERROR:  tuple already updated by self

This is old issue, but just faced the same with Recreate strategy. Next article explains how to reanimate failing pod and it helped me. https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html https://serverfault.com/questions/942743/postgres-crash-loop-caused-by-a-tuple-concurrently-updated-error

We use only one database pod, so I believe that maybe it will not solve high availability issue, but at least database will work with one pod. Maybe will be useful for somebody.

Not with this trivial layout. This problem is equivalent to non-container scenario where you do dnf update postgresql-server. You have to shut down the old server, and start a new one. I.e. you can not let two servers write into the same data directory.

Btw., PostgreSQL server has a guard against “multiple servers writing to the same data directory” situation, but unfortunately in container scenario - it has deterministic PID number (PID=1). So concurrent PostgreSQL server (in different container) checks the pid/lock file, compares the PID with it’s own PID and assumes “I’m the PID=1, so the PID file is some leftover from previous run”. So it removes the PID file and continues with data directory modifications. This has a disaster potential.

Our templates only support Recreate strategy. The fact Rolling “mostly” works is matter of luck that the old server is not under heavy load.

That said, zero downtime problem needs to be solved on higher logical layer.