postgres-operator: Failure to restore backup
Describe the bug Create a cluster with no replicas with S3 storage (minio) for backups. Try to restore to a previous state.
pgo restore zzz --backup-opts=" --set=20210201-191209F"
If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore request for zzz with opts " --set=20210201-191209F" and pitr-target=""
workflow id a3eea6e2-c83c-4dc8-b28a-bd9a983f5297
zzz is stopped and removed
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pgo-deploy-xfcvp 0/1 Completed 0 12m 10.244.243.48 hestia <none> <none>
postgres-operator-797bcb5d6-4mh48 4/4 Running 0 12m 10.244.243.26 hestia <none> <none>
zzz-backrest-shared-repo-6666b49bf4-l8pt5 1/1 Running 0 10m 10.244.243.35 hestia <none> <none>
Logs from postgres-operator
time="2021-02-01T19:15:31Z" level=info msg="found existing pgha ConfigMap for cluster zzz, setting init flag to 'true'" func="internal/operator/cluster.AddClusterBootstrap()" file="internal/operator/cluster/cluster.go:270" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="creating Pgcluster zzz in namespace pgo" func="internal/operator/cluster.getClusterDeploymentFields()" file="internal/operator/cluster/clusterlogic.go:263" version=4.6.0
time="2021-02-01T19:15:31Z" level=info msg="exporter secret zzz-exporter-secret already present, will reuse" func="internal/operator/cluster.CreateExporterSecret()" file="internal/operator/cluster/exporter.go:145" version=4.6.0
2021/02/01 19:15:31 INF 10 (localhost:4150) connecting to nsqd
time="2021-02-01T19:15:31Z" level=error msg="pgtask Controller: invalid character '{' looking for beginning of object key string" func="internal/controller/pgtask.(*Controller).handleBackrestRestore()" file="internal/controller/pgtask/backresthandler.go:54" version=4.6.0
After this nothing happens. Only way to recover was to create another cluster in standby and promote it. This allowed me to recover before restore failure but i do not known how to return to a previous backup.
To Reproduce Steps to reproduce the behavior:
start with a clean db
pgo create cluster zzz --replica-count=0 --node-label nodetype=testing --cpu=10.0 --cpu-limit=10.0 --memory=1248Mi --memory-limit=1248Mi --custom-config=zzz-tune-config --password-superuser=xpto --password-replication=xpt0! --password=xpt0! --metrics --pgbackrest-storage-type=s3 -n pgo
make db backup
pgo backup zzz
do some db changes and make another backup and try to restore before changes
pgo restore zzz --backup-opts=" --set=20210201-191209F"
Expected behavior DB should restart with the previous db state
Please tell us about your environment:
- Operating System:
- Where is this running ( Local, Cloud Provider): Local
- Storage being used (NFS, Hostpath, Gluster, etc): Ceph
- Container Image Tag: centos8-13.1-4.6.0
- PostgreSQL Version: 13
- Platform (Docker, Kubernetes, OpenShift): Kubernetes 1.20.1
- Platform Version: 4.6.0
zzz-tune-config:
apiVersion: v1
kind: ConfigMap
metadata:
name: zzz-tune-config
namespace: pgo
data:
postgres-ha.yaml: |-
bootstrap:
dcs:
postgresql:
parameters:
synchronous_commit: "off"
shared_buffers: "256MB"
effective_cache_size: "768MB"
effective_io_concurrency: "20"
max_parallel_workers: "2"
max_parallel_workers_per_gather: "1"
max_worker_processes: "16"
checkpoint_completion_target: "0.9"
wal_buffers: "7864kB"
work_mem: "6553kB"
max_connections: "20"
random_page_cost: "1.5"
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
Commits related to this issue
- Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to jkatz/postgres-operator by deleted user 3 years ago
- Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to CrunchyData/postgres-operator by jkatz 3 years ago
- Fix syntax for restore jobs with node affinity These changes were not carried through when the syntax updates for node affinity were introduced in 9818ac9e. Issue: [ch10696] Issue: #2251 — committed to CrunchyData/postgres-operator by jkatz 3 years ago
The fix is merged and will appear in 4.6.2. The fix is effectively what I showed in https://github.com/CrunchyData/postgres-operator/issues/2251#issuecomment-790948496 so that will allow for you to get by in the interim. Thanks for reporting and troubleshooting!