postgres-operator: Cluster goes down and after recreating couldn't get backup working

** Which example are you working with? ** PostgreSQL Operator Installer

What is the current behavior? Cluster is UP and can’t restore backups

What is the expected behavior? To restore backup

Other information (e.g. detailed explanation, related issues, etc) Today the cluster went down and after a lot of debugging I deleted the cluster and recreated it again but it shows that all backups have errors.

Please tell us about your environment:

  • Operating System:
  • Where is this running: Cloud Provider (Oracle)
  • Storage being used: OCI
  • Container Image Tag: centos8-4.6.2
  • PostgreSQL Version: 13.2
  • Platform: Kubernetes
  • Platform Version: v1.19.7

If possible please run the following on the kubernetes commands and provide the result: kubectl -n pgo describe mycluster-stanza-create-dblv6

Name:         mycluster-stanza-create-dblv6
Namespace:    pgo
Priority:     0
Node:         10.0.10.154/10.0.10.154
Start Time:   Wed, 14 Apr 2021 14:52:25 +0300
Labels:       backrest-command=stanza-create
              controller-uid=f87ddc3c-b328-403b-a9d9-e06691e6ead3
              job-name=mycluster-stanza-create
              pg-cluster=mycluster
              pgo-backrest=true
              pgo-backrest-job=true
              vendor=crunchydata
Annotations:  <none>
Status:       Failed
IP:           10.244.1.226
IPs:
  IP:           10.244.1.226
Controlled By:  Job/mycluster-stanza-create
Containers:
  backrest:
    Container ID:   docker://988c38bc50d088a1b23d29b2a2498557b372b660cdeea5f79e36ffd6c6485391
    Image:          registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-13.2-4.6.2
    Image ID:       docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:231f5e0fd4279569a4a8970151ef0dcf242117581b64010c4537cd2adb6f617e
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 14 Apr 2021 14:52:26 +0300
      Finished:     Wed, 14 Apr 2021 14:52:26 +0300
    Ready:          False
    Restart Count:  0
    Environment:
      COMMAND:                           stanza-create
      MODE:                              pgbackrest
      COMMAND_OPTS:                       --db-host=10.244.1.48 --db-path=/pgdata/mycluster
      PITR_TARGET:
      PODNAME:                           mycluster-backrest-shared-repo-5bf646fb5d-2hw6n
      PGBACKREST_STANZA:
      PGBACKREST_DB_PATH:
      PGBACKREST_REPO1_PATH:
      PGBACKREST_REPO1_TYPE:             posix
      PGHA_PGBACKREST_LOCAL_S3_STORAGE:  false
      PGHA_PGBACKREST_S3_VERIFY_TLS:     true
      PGBACKREST_LOG_PATH:               /tmp
      NAMESPACE:                         pgo (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pgo-backrest-token-g78pj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  pgo-backrest-token-g78pj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pgo-backrest-token-g78pj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  43m   default-scheduler  Successfully assigned pgo/mycluster-stanza-create-dblv6 to 10.0.10.154
  Normal  Pulled     43m   kubelet            Container image "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-13.2-4.6.2" already present on machine
  Normal  Created    43m   kubelet            Created container backrest
  Normal  Started    43m   kubelet            Started container backrest

kubectl -n pgo describe pvc

Name:          mycluster
Namespace:     pgo
StorageClass:  oci
Status:        Bound
Volume:        ocid1.volume.oc1.me-jeddah-1.abvgkljrfa3gulydhfjjzhmr47op4ae55doiqhqpitjomiygxwhos5ky63qa
Labels:        pg-cluster=mycluster
               pgremove=true
               vendor=crunchydata
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: oracle.com/oci
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      50Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mycluster-75dc58676-6vjcb
Events:
  Type    Reason                 Age                From                                                                                  Message
  ----    ------                 ----               ----                                                                                  -------
  Normal  ExternalProvisioning   47m (x3 over 47m)  persistentvolume-controller                                                           waiting for a volume to be created, either by external provisioner "oracle.com/oci" or manually created by system administrator
  Normal  Provisioning           47m (x2 over 47m)  oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c  External provisioner is provisioning volume for claim "pgo/mycluster"
  Normal  ProvisioningSucceeded  47m (x2 over 47m)  oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c  Successfully provisioned volume ocid1.volume.oc1.me-jeddah-1.abvgkljrfa3gulydhfjjzhmr47op4ae55doiqhqpitjomiygxwhos5ky63qa


Name:          mycluster-pgbr-repo
Namespace:     pgo
StorageClass:  oci
Status:        Bound
Volume:        ocid1.volume.oc1.me-jeddah-1.abvgkljrpzjvg5uquj37tqcfy57c3sicv4ljtlj545bch6awwoazebn7foaq
Labels:        pg-cluster=mycluster
               pgremove=true
               vendor=crunchydata
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: oracle.com/oci
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      50Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mycluster-backrest-shared-repo-5bf646fb5d-2hw6n
Events:
  Type    Reason                 Age                From                                                                                  Message
  ----    ------                 ----               ----                                                                                  -------
  Normal  ExternalProvisioning   47m (x2 over 47m)  persistentvolume-controller                                                           waiting for a volume to be created, either by external provisioner "oracle.com/oci" or manually created by system administrator
  Normal  Provisioning           47m (x2 over 47m)  oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c  External provisioner is provisioning volume for claim "pgo/mycluster-pgbr-repo"
  Normal  ProvisioningSucceeded  47m (x2 over 47m)  oracle.com/oci_control-plane-host-10-64-195-118_8fd7a8e5-77e1-450c-985d-a66f4714e84c  Successfully provisioned volume ocid1.volume.oc1.me-jeddah-1.abvgkljrpzjvg5uquj37tqcfy57c3sicv4ljtlj545bch6awwoazebn7foaq


kubectl -n pgo logs mycluster-stanza-create-dblv6


time="2021-04-14T11:52:26Z" level=info msg="crunchy-pgbackrest starts"
time="2021-04-14T11:52:26Z" level=info msg="debug flag set to %tfalse"
time="2021-04-14T11:52:26Z" level=info msg="backrest stanza-create command requested"
time="2021-04-14T11:52:26Z" level=info msg="command to execute is [pgbackrest stanza-create  --db-host=10.244.1.48 --db-path=/pgdata/mycluster]"
time="2021-04-14T11:52:26Z" level=info msg="output=[]"
time="2021-04-14T11:52:26Z" level=info msg="stderr=[WARN: unable to check pg-1: [UnknownError] remote-0 process on '10.244.1.48' terminated unexpectedly [255]\nERROR: [056]: unable to find primary cluster - cannot proceed\n]"
time="2021-04-14T11:52:26Z" level=fatal msg="command terminated with exit code 56"

pgo show backup mycluster


cluster: mycluster
storage type: posix

stanza: db
    status: missing stanza path
    cipher:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

https://github.com/CrunchyData/postgres-operator/pull/2362

Available in 4.7.0 and 4.6.3 when they are released.