postgres-operator: Failure to restore backup

Describe the bug Create a cluster with no replicas with S3 storage (minio) for backups. Try to restore to a previous state.

pgo restore zzz --backup-opts=" --set=20210201-191209F"
If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore request for zzz with opts " --set=20210201-191209F" and pitr-target=""
workflow id a3eea6e2-c83c-4dc8-b28a-bd9a983f5297

zzz is stopped and removed

NAME                                        READY   STATUS      RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
pgo-deploy-xfcvp                            0/1     Completed   0          12m   10.244.243.48   hestia   <none>           <none>
postgres-operator-797bcb5d6-4mh48           4/4     Running     0          12m   10.244.243.26   hestia   <none>           <none>
zzz-backrest-shared-repo-6666b49bf4-l8pt5   1/1     Running     0          10m   10.244.243.35   hestia   <none>           <none>

Logs from postgres-operator

time="2021-02-01T19:15:31Z" level=info msg="found existing pgha ConfigMap for cluster zzz, setting init flag to 'true'" func="internal/operator/cluster.AddClusterBootstrap()" file="internal/operator/cluster/cluster.go:270" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="creating Pgcluster zzz in namespace pgo" func="internal/operator/cluster.getClusterDeploymentFields()" file="internal/operator/cluster/clusterlogic.go:263" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="exporter secret zzz-exporter-secret already present, will reuse" func="internal/operator/cluster.CreateExporterSecret()" file="internal/operator/cluster/exporter.go:145" version=4.6.0
2021/02/01 19:15:31 INF   10 (localhost:4150) connecting to nsqd
time="2021-02-01T19:15:31Z" level=error msg="pgtask Controller: invalid character '{' looking for beginning of object key string" func="internal/controller/pgtask.(*Controller).handleBackrestRestore()" file="internal/controller/pgtask/backresthandler.go:54" version=4.6.0

After this nothing happens. Only way to recover was to create another cluster in standby and promote it. This allowed me to recover before restore failure but i do not known how to return to a previous backup.

To Reproduce Steps to reproduce the behavior:

start with a clean db

pgo create cluster zzz --replica-count=0 --node-label nodetype=testing --cpu=10.0 --cpu-limit=10.0 --memory=1248Mi --memory-limit=1248Mi --custom-config=zzz-tune-config --password-superuser=xpto --password-replication=xpt0! --password=xpt0! --metrics  --pgbackrest-storage-type=s3 -n pgo

make db backup

pgo backup zzz

do some db changes and make another backup and try to restore before changes

pgo restore zzz --backup-opts=" --set=20210201-191209F"

Expected behavior DB should restart with the previous db state

Please tell us about your environment:

  • Operating System:
  • Where is this running ( Local, Cloud Provider): Local
  • Storage being used (NFS, Hostpath, Gluster, etc): Ceph
  • Container Image Tag: centos8-13.1-4.6.0
  • PostgreSQL Version: 13
  • Platform (Docker, Kubernetes, OpenShift): Kubernetes 1.20.1
  • Platform Version: 4.6.0

zzz-tune-config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: zzz-tune-config
  namespace: pgo
data:
  postgres-ha.yaml: |-
    bootstrap:
      dcs:
        postgresql:
          parameters:
            synchronous_commit: "off"
            shared_buffers: "256MB"
            effective_cache_size: "768MB"
            effective_io_concurrency: "20"
            max_parallel_workers: "2"
            max_parallel_workers_per_gather: "1"
            max_worker_processes: "16"
            checkpoint_completion_target: "0.9"
            wal_buffers: "7864kB"
            work_mem: "6553kB"
            max_connections: "20"
            random_page_cost: "1.5"

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Commits related to this issue

Most upvoted comments

The fix is merged and will appear in 4.6.2. The fix is effectively what I showed in https://github.com/CrunchyData/postgres-operator/issues/2251#issuecomment-790948496 so that will allow for you to get by in the interim. Thanks for reporting and troubleshooting!