postgres-operator: major version upgrade failed

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.7.1
  • **Where do you run it - cloud or metal? Syseleven (Openstack)
  • Are you running Postgres Operator in production? yes
  • Type of issue? [Bug report, question, feature request, etc.] Bug

We just tried to update the major postgres version of one of our patroni clusters.

For this we’ve update the following fileds in the cr:

spec.dockerImage: from: “registry.opensource.zalan.do/acid/spilo-12:1.6-p5” to “registry.opensource.zalan.do/acid/spilo-13:2.1-p1”

spec.postgresql.version: from “12” to “13”

In the first place everything looked fine as the pods were successfully restarted but the operator logs point out that it fails:

time="2021-11-15T16:26:43Z" level=debug msg="making GET http request: http://172.25.78.171:8008/patroni" cluster-name=auth/auth-postgres-db pkg=cluster
time="2021-11-15T16:26:43Z" level=info msg="healthy cluster ready to upgrade, current: 120008 desired: 130000" cluster-name=auth/auth-postgres-db pkg=cluster
time="2021-11-15T16:26:43Z" level=info msg="triggering major version upgrade on pod auth-postgres-db-2 of 3 pods" cluster-name=auth/auth-postgres-db pkg=cluster
time="2021-11-15T16:26:45Z" level=error msg="major version upgrade failed: could not execute: command terminated with exit code 1" cluster-name=auth/auth-postgres-db pkg=cluster
time="2021-11-15T16:26:45Z" level=info msg="cluster has been synced" cluster-name=auth/auth-postgres-db pkg=controller worker=0
time="2021-11-15T16:26:45Z" level=info msg="recieved add event for already existing Postgres cluster" cluster-name=auth/auth-postgres-db pkg=controller worker=0

What can we do to get the upgrade to run?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Our upgrade scripts are doing the heavy lifting: executing pg_upgrade and upgrading replicas with rsync. In addition to that the script is taking care of very specific spilo configuration. If your configuration is far off the standard one the upgrade might fail.

Actually, there are many reasons why a major upgrade could fail. Some of them, for example, could be related to your database schema. And, as you may guess, our scripts will not mess up with it.

There is only one recipe if the upgrade doesn’t work automatically:

  1. Exec into the master pods as postgres used: kubectl exec -ti my-pod-0 -- su postgres
  2. Call the update script manually: python3 /scripts/inplace_upgrade.py <NUM>. Where <NUM> is the number of pods in your cluster
  3. Check update logs in the /home/postgres/pgdata/pgroot/data_upgrade.
  4. Fix reported issues and go to the step 2.