postgres-operator: Failure to restore backup

Describe the bug Create a cluster with no replicas with S3 storage (minio) for backups. Try to restore to a previous state.

pgo restore zzz --backup-opts=" --set=20210201-191209F"
If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore request for zzz with opts " --set=20210201-191209F" and pitr-target=""
workflow id a3eea6e2-c83c-4dc8-b28a-bd9a983f5297

zzz is stopped and removed

NAME                                        READY   STATUS      RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
pgo-deploy-xfcvp                            0/1     Completed   0          12m   10.244.243.48   hestia   <none>           <none>
postgres-operator-797bcb5d6-4mh48           4/4     Running     0          12m   10.244.243.26   hestia   <none>           <none>
zzz-backrest-shared-repo-6666b49bf4-l8pt5   1/1     Running     0          10m   10.244.243.35   hestia   <none>           <none>

Logs from postgres-operator

time="2021-02-01T19:15:31Z" level=info msg="found existing pgha ConfigMap for cluster zzz, setting init flag to 'true'" func="internal/operator/cluster.AddClusterBootstrap()" file="internal/operator/cluster/cluster.go:270" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="creating Pgcluster zzz in namespace pgo" func="internal/operator/cluster.getClusterDeploymentFields()" file="internal/operator/cluster/clusterlogic.go:263" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="exporter secret zzz-exporter-secret already present, will reuse" func="internal/operator/cluster.CreateExporterSecret()" file="internal/operator/cluster/exporter.go:145" version=4.6.0
2021/02/01 19:15:31 INF   10 (localhost:4150) connecting to nsqd
time="2021-02-01T19:15:31Z" level=error msg="pgtask Controller: invalid character '{' looking for beginning of object key string" func="internal/controller/pgtask.(*Controller).handleBackrestRestore()" file="internal/controller/pgtask/backresthandler.go:54" version=4.6.0

After this nothing happens. Only way to recover was to create another cluster in standby and promote it. This allowed me to recover before restore failure but i do not known how to return to a previous backup.

To Reproduce Steps to reproduce the behavior:

start with a clean db

pgo create cluster zzz --replica-count=0 --node-label nodetype=testing --cpu=10.0 --cpu-limit=10.0 --memory=1248Mi --memory-limit=1248Mi --custom-config=zzz-tune-config --password-superuser=xpto --password-replication=xpt0! --password=xpt0! --metrics  --pgbackrest-storage-type=s3 -n pgo

make db backup

pgo backup zzz

do some db changes and make another backup and try to restore before changes

pgo restore zzz --backup-opts=" --set=20210201-191209F"

Expected behavior DB should restart with the previous db state

Please tell us about your environment:

Operating System:
Where is this running ( Local, Cloud Provider): Local
Storage being used (NFS, Hostpath, Gluster, etc): Ceph
Container Image Tag: centos8-13.1-4.6.0
PostgreSQL Version: 13
Platform (Docker, Kubernetes, OpenShift): Kubernetes 1.20.1
Platform Version: 4.6.0

zzz-tune-config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: zzz-tune-config
  namespace: pgo
data:
  postgres-ha.yaml: |-
    bootstrap:
      dcs:
        postgresql:
          parameters:
            synchronous_commit: "off"
            shared_buffers: "256MB"
            effective_cache_size: "768MB"
            effective_io_concurrency: "20"
            max_parallel_workers: "2"
            max_parallel_workers_per_gather: "1"
            max_worker_processes: "16"
            checkpoint_completion_target: "0.9"
            wal_buffers: "7864kB"
            work_mem: "6553kB"
            max_connections: "20"
            random_page_cost: "1.5"

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16 (8 by maintainers)

Commits related to this issue

Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to jkatz/postgres-operator by deleted user 3 years ago
Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to CrunchyData/postgres-operator by jkatz 3 years ago
Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to CrunchyData/postgres-operator by jkatz 3 years ago

Most upvoted comments

The fix is merged and will appear in 4.6.2. The fix is effectively what I showed in https://github.com/CrunchyData/postgres-operator/issues/2251#issuecomment-790948496 so that will allow for you to get by in the interim. Thanks for reporting and troubleshooting!

jkatz on Mar 4, 2021