cloudnative-pg: S3 Backup size indefinitely growing
Background
Kubernetes server version: EKS 1.25.10
Cloudnative-pg version:
The following manifest is deployed:
apiVersion: v1
data:
password: *****************************************
username: *****************************************
kind: Secret
metadata:
name: app-superuser-secret
type: kubernetes.io/basic-auth
---
apiVersion: v1
kind: Secret
metadata:
name: backup-creds
data:
ACCESS_KEY_ID: *****************************************
ACCESS_SECRET_KEY: *****************************************
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: app-prod
spec:
description: "app Production database"
imageName: ghcr.io/cloudnative-pg/postgresql:15.2
instances: 1
postgresql:
pg_hba:
- "host all all all md5"
superuserSecret:
name: app-superuser-secret
storage:
size: 300Gi
backup:
barmanObjectStore:
destinationPath: "s3://dest-bucket/db_bak/"
s3Credentials:
accessKeyId:
name: backup-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-creds
key: ACCESS_SECRET_KEY
retentionPolicy: "30d"
resources:
requests:
memory: "12Gi"
cpu: "3"
limits:
memory: "12Gi"
cpu: "3"
The disk space used in the container is the following (running df -h inside the container) is barely 28 GB as you see in the following screen:
Issue
The bucket size is inexorably increasing as you can see from this screen (done over 2 months of monitoring):
Even though the retention period is set to 30d the S3 bucket is unexpectedly increasing and has arrived to more than 3TB
Do you have any suggestion to avoid this issue and how to save some space by not losing the ability to restore the database?
Thanks.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 20 (8 by maintainers)
I was using Google Cloud Storage’s S3 interoperability which lacked a S3 API endpoint that barman seemed to be using after looking into the logs. I switched to GCS support that is natively in barman/CloudNativePG and it seems to work now.
I’ve just checked - latest 15.3-8 image already contains barman 3.7.0 😃
Yeah that’s the issue, you are not actually backing up your cluster by just archiving the WAL files. A PostgreSQL physical backup is composed by a Base Backup + the associated WALs. You need to set up scheduled backups, and on the next backup execution the retention policy will be enforced and all the unused WALs in the bucket will be removed. If you don’t want to wait for the cron you can trigger a new backup via the cnpg plugin (or create one via the Backup CRD).