postgres-operator: Cluster goes down and after recreating couldn't get backup working
** Which example are you working with? ** PostgreSQL Operator Installer
What is the current behavior? Cluster is UP and can’t restore backups
What is the expected behavior? To restore backup
Other information (e.g. detailed explanation, related issues, etc) Today the cluster went down and after a lot of debugging I deleted the cluster and recreated it again but it shows that all backups have errors.
Please tell us about your environment:
- Operating System:
- Where is this running: Cloud Provider (Oracle)
- Storage being used: OCI
- Container Image Tag: centos8-4.6.2
- PostgreSQL Version: 13.2
- Platform: Kubernetes
- Platform Version: v1.19.7
If possible please run the following on the kubernetes commands and provide the result:
kubectl -n pgo describe mycluster-stanza-create-dblv6
Name: mycluster-stanza-create-dblv6
Namespace: pgo
Priority: 0
Node: 10.0.10.154/10.0.10.154
Start Time: Wed, 14 Apr 2021 14:52:25 +0300
Labels: backrest-command=stanza-create
controller-uid=f87ddc3c-b328-403b-a9d9-e06691e6ead3
job-name=mycluster-stanza-create
pg-cluster=mycluster
pgo-backrest=true
pgo-backrest-job=true
vendor=crunchydata
Annotations: <none>
Status: Failed
IP: 10.244.1.226
IPs:
IP: 10.244.1.226
Controlled By: Job/mycluster-stanza-create
Containers:
backrest:
Container ID: docker://988c38bc50d088a1b23d29b2a2498557b372b660cdeea5f79e36ffd6c6485391
Image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-13.2-4.6.2
Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:231f5e0fd4279569a4a8970151ef0dcf242117581b64010c4537cd2adb6f617e
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Apr 2021 14:52:26 +0300
Finished: Wed, 14 Apr 2021 14:52:26 +0300
Ready: False
Restart Count: 0
Environment:
COMMAND: stanza-create
MODE: pgbackrest
COMMAND_OPTS: --db-host=10.244.1.48 --db-path=/pgdata/mycluster
PITR_TARGET:
PODNAME: mycluster-backrest-shared-repo-5bf646fb5d-2hw6n
PGBACKREST_STANZA:
PGBACKREST_DB_PATH:
PGBACKREST_REPO1_PATH:
PGBACKREST_REPO1_TYPE: posix
PGHA_PGBACKREST_LOCAL_S3_STORAGE: false
PGHA_PGBACKREST_S3_VERIFY_TLS: true
PGBACKREST_LOG_PATH: /tmp
NAMESPACE: pgo (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from pgo-backrest-token-g78pj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
pgo-backrest-token-g78pj:
Type: Secret (a volume populated by a Secret)
SecretName: pgo-backrest-token-g78pj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned pgo/mycluster-stanza-create-dblv6 to 10.0.10.154
Normal Pulled 43m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-13.2-4.6.2" already present on machine
Normal Created 43m kubelet Created container backrest
Normal Started 43m kubelet Started container backrest
kubectl -n pgo describe pvc
Name: mycluster
Namespace: pgo
StorageClass: oci
Status: Bound
Volume: ocid1.volume.oc1.me-jeddah-1.abvgkljrfa3gulydhfjjzhmr47op4ae55doiqhqpitjomiygxwhos5ky63qa
Labels: pg-cluster=mycluster
pgremove=true
vendor=crunchydata
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: oracle.com/oci
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 50Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: mycluster-75dc58676-6vjcb
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 47m (x3 over 47m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "oracle.com/oci" or manually created by system administrator
Normal Provisioning 47m (x2 over 47m) oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c External provisioner is provisioning volume for claim "pgo/mycluster"
Normal ProvisioningSucceeded 47m (x2 over 47m) oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c Successfully provisioned volume ocid1.volume.oc1.me-jeddah-1.abvgkljrfa3gulydhfjjzhmr47op4ae55doiqhqpitjomiygxwhos5ky63qa
Name: mycluster-pgbr-repo
Namespace: pgo
StorageClass: oci
Status: Bound
Volume: ocid1.volume.oc1.me-jeddah-1.abvgkljrpzjvg5uquj37tqcfy57c3sicv4ljtlj545bch6awwoazebn7foaq
Labels: pg-cluster=mycluster
pgremove=true
vendor=crunchydata
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: oracle.com/oci
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 50Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: mycluster-backrest-shared-repo-5bf646fb5d-2hw6n
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 47m (x2 over 47m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "oracle.com/oci" or manually created by system administrator
Normal Provisioning 47m (x2 over 47m) oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c External provisioner is provisioning volume for claim "pgo/mycluster-pgbr-repo"
Normal ProvisioningSucceeded 47m (x2 over 47m) oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c Successfully provisioned volume ocid1.volume.oc1.me-jeddah-1.abvgkljrpzjvg5uquj37tqcfy57c3sicv4ljtlj545bch6awwoazebn7foaq
kubectl -n pgo logs mycluster-stanza-create-dblv6
time="2021-04-14T11:52:26Z" level=info msg="crunchy-pgbackrest starts"
time="2021-04-14T11:52:26Z" level=info msg="debug flag set to %tfalse"
time="2021-04-14T11:52:26Z" level=info msg="backrest stanza-create command requested"
time="2021-04-14T11:52:26Z" level=info msg="command to execute is [pgbackrest stanza-create --db-host=10.244.1.48 --db-path=/pgdata/mycluster]"
time="2021-04-14T11:52:26Z" level=info msg="output=[]"
time="2021-04-14T11:52:26Z" level=info msg="stderr=[WARN: unable to check pg-1: [UnknownError] remote-0 process on '10.244.1.48' terminated unexpectedly [255]\nERROR: [056]: unable to find primary cluster - cannot proceed\n]"
time="2021-04-14T11:52:26Z" level=fatal msg="command terminated with exit code 56"
pgo show backup mycluster
cluster: mycluster
storage type: posix
stanza: db
status: missing stanza path
cipher:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
https://github.com/CrunchyData/postgres-operator/pull/2362
Available in 4.7.0 and 4.6.3 when they are released.