cloudnative-pg: S3 Backup size indefinitely growing

Background Kubernetes server version: EKS 1.25.10 Cloudnative-pg version:

The following manifest is deployed:

apiVersion: v1
data:
  password: *****************************************
  username: *****************************************
kind: Secret
metadata:
  name: app-superuser-secret
type: kubernetes.io/basic-auth
---
apiVersion: v1
kind: Secret
metadata:
  name: backup-creds
data:
  ACCESS_KEY_ID: *****************************************
  ACCESS_SECRET_KEY: *****************************************
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-prod
spec:
  description: "app Production database"
  imageName: ghcr.io/cloudnative-pg/postgresql:15.2
  instances: 1

  postgresql:
    pg_hba:
      - "host all all all md5"

  superuserSecret:
    name: app-superuser-secret

  storage:
    size: 300Gi

  backup:
    barmanObjectStore:
      destinationPath: "s3://dest-bucket/db_bak/"
      s3Credentials:
        accessKeyId:
          name: backup-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-creds
          key: ACCESS_SECRET_KEY
    retentionPolicy: "30d"

  resources:
    requests:
      memory: "12Gi"
      cpu: "3"
    limits:
      memory: "12Gi"
      cpu: "3"

The disk space used in the container is the following (running df -h inside the container) is barely 28 GB as you see in the following screen:

Issue The bucket size is inexorably increasing as you can see from this screen (done over 2 months of monitoring):

Even though the retention period is set to 30d the S3 bucket is unexpectedly increasing and has arrived to more than 3TB

Do you have any suggestion to avoid this issue and how to save some space by not losing the ability to restore the database?

Thanks.

About this issue

Original URL
State: open
Created a year ago
Comments: 20 (8 by maintainers)

Most upvoted comments

I also seem to be facing this issue, but perhaps it’s because of scheduled backup objects not being deleted automatically? e.g. I have retentionPolicy set to 14 days and have daily scheduled backups, but I don’t believe there is a system in cloudnative-pg to delete scheduled backups automatically, so I’ve still got more than 14 days of WALs. Deleting a backup resource doesn’t seem to do anything either, it hasn’t removed it from S3. EDIT: Retention policy seems to be working now, it was quietly failing to remove things due to a bad S3 API implementation that is now resolved on my end.

Did you change something to solve the issue?

I was using Google Cloud Storage’s S3 interoperability which lacked a S3 API endpoint that barman seemed to be using after looking into the logs. I switched to GCS support that is natively in barman/CloudNativePG and it seems to work now.

clrxbl on Sep 4, 2023

I’ve just checked - latest 15.3-8 image already contains barman 3.7.0 😃

MichaluxPL on Jul 28, 2023

Yeah that’s the issue, you are not actually backing up your cluster by just archiving the WAL files. A PostgreSQL physical backup is composed by a Base Backup + the associated WALs. You need to set up scheduled backups, and on the next backup execution the retention policy will be enforced and all the unused WALs in the bucket will be removed. If you don’t want to wait for the cron you can trigger a new backup via the cnpg plugin (or create one via the Backup CRD).

NiccoloFei on Jul 13, 2023