cloudnative-pg: Restore from backup fail with wrong permissions

Hi! I get permission error on pgdata when create a cluster with the recovery option (from another existent cluster)

{"level":"info","ts":1674558339.3954432,"msg":"barman-cloud-check-wal-archive checking the first wal","logging_pod":"med-1"}
{"level":"info","ts":1674558339.733525,"msg":"Recovering existing backup","logging_pod":"med-1","backup":{"metadata":{"name":"med-pgbackup","namespace":"database","uid":"ec528b6a-4332-4169-911d-4e1d1f371c23","resourceVersion":"111945109","generation":1,"creationTimestamp":"2023-01-24T08:07:13Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"postgresql.cnpg.io/v1\",\"kind\":\"Backup\",\"metadata\":{\"annotations\":{},\"name\":\"med-pgbackup\",\"namespace\":\"database\"},\"spec\":{\"cluster\":{\"name\":\"medpg\"}}}\n"},"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"postgresql.cnpg.io/v1","time":"2023-01-24T08:07:13Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:cluster":{".":{},"f:name":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"postgresql.cnpg.io/v1","time":"2023-01-24T08:07:59Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:backupId":{},"f:beginLSN":{},"f:beginWal":{},"f:destinationPath":{},"f:endLSN":{},"f:endWal":{},"f:endpointURL":{},"f:instanceID":{".":{},"f:ContainerID":{},"f:podName":{}},"f:phase":{},"f:s3Credentials":{".":{},"f:accessKeyId":{".":{},"f:key":{},"f:name":{}},"f:inheritFromIAMRole":{},"f:secretAccessKey":{".":{},"f:key":{},"f:name":{}}},"f:serverName":{},"f:startedAt":{},"f:stoppedAt":{}}},"subresource":"status"}]},"spec":{"cluster":{"name":"medpg"}},"status":{"s3Credentials":{"accessKeyId":{"name":"miniobackup","key":"ACCESS_KEY_ID"},"secretAccessKey":{"name":"miniobackup","key":"ACCESS_SECRET_KEY"},"inheritFromIAMRole":false},"endpointURL":"http://minio.gitlab.svc.cluster.local:9000","destinationPath":"s3://postgresql-backup","serverName":"medpg","backupId":"20230124T080714","phase":"completed","startedAt":"2023-01-24T08:07:14Z","stoppedAt":"2023-01-24T08:07:22Z","beginWal":"000000040000002D00000029","endWal":"000000040000002D00000029","beginLSN":"2D/29000028","endLSN":"2D/29000138","instanceID":{"podName":"medpg-3","ContainerID":"containerd://f00fce6da6aafe9288135af641d81621d8a379a819ce63eb0c2ab8cb424ffed5"}}}}
{"level":"info","ts":1674558339.733743,"msg":"Starting barman-cloud-restore","logging_pod":"med-1","options":["--endpoint-url","http://minio.gitlab.svc.cluster.local:9000","s3://postgresql-backup","medpg","20230124T080714","--cloud-provider","aws-s3","/var/lib/postgresql/data/pgdata"]}
{"level":"info","ts":1674558342.6783388,"msg":"Restore completed","logging_pod":"med-1"}
{"level":"info","ts":1674558342.6784635,"msg":"Creating new data directory","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","initDbOptions":["--username","postgres","-D","/controller/recovery/datadir_1857835105","--no-sync"]}
{"level":"info","ts":1674558343.1420522,"logger":"initdb","msg":"The files belonging to this database system will be owned by user \"postgres\".\nThis user must also own the server process.\n\nThe database cluster will be initialized with locale \"en_US.utf8\".\nThe default database encoding has accordingly been set to \"UTF8\".\nThe default text search configuration will be set to \"english\".\n\nData page checksums are disabled.\n\nfixing permissions on existing directory /controller/recovery/datadir_1857835105 ... ok\ncreating subdirectories ... ok\nselecting dynamic shared memory implementation ... posix\nselecting default max_connections ... 100\nselecting default shared_buffers ... 128MB\nselecting default time zone ... Etc/UTC\ncreating configuration files ... ok\nrunning bootstrap script ... ok\nperforming post-bootstrap initialization ... ok\n\nSync to disk skipped.\nThe data directory might become corrupt if the operating system crashes.\n\n\nSuccess. You can now start the database server using:\n\n    pg_ctl -D /controller/recovery/datadir_1857835105 -l logfile start\n\n","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.142085,"logger":"initdb","msg":"initdb: warning: enabling \"trust\" authentication for local connections\nYou can change this by editing pg_hba.conf or using the option -A, or\n--auth-local and --auth-host, the next time you run initdb.\n","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.1474771,"msg":"Installed configuration file","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","filename":"pg_hba.conf"}
{"level":"info","ts":1674558343.147517,"msg":"Ignore minSyncReplicas to enforce self-healing","logging_pod":"med-1","syncReplicas":-1,"minSyncReplicas":0,"maxSyncReplicas":0}
{"level":"info","ts":1674558343.1526413,"msg":"Installed configuration file","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","filename":"custom.conf"}
{"level":"info","ts":1674558343.1812582,"msg":"Generated recovery configuration","logging_pod":"med-1","configuration":"recovery_target_action = promote\nrestore_command = 'barman-cloud-wal-restore --endpoint-url http://minio.gitlab.svc.cluster.local:9000 s3://postgresql-backup medpg --cloud-provider aws-s3 %f %p'\n"}
{"level":"info","ts":1674558343.1885922,"msg":"enforcing parameters found in pg_controldata","logging_pod":"med-1","parameters":{"max_connections":"100","max_locks_per_transaction":"64","max_prepared_transactions":"0","max_wal_senders":"10","max_worker_processes":"32"}}
{"level":"info","ts":1674558343.1914668,"msg":"Starting up instance","logging_pod":"med-1","pgdata":"/var/lib/postgresql/data/pgdata","options":["start","-w","-D","/var/lib/postgresql/data/pgdata","-o","-c port=5432 -c unix_socket_directories=/controller/run","-t 40000000","-o","-c listen_addresses='127.0.0.1'"]}
{"level":"info","ts":1674558343.2063065,"logger":"pg_ctl","msg":"waiting for server to start....2023-01-24 11:05:43.206 UTC [42] FATAL:  data directory \"/var/lib/postgresql/data/pgdata\" has invalid permissions","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2063231,"logger":"pg_ctl","msg":"2023-01-24 11:05:43.206 UTC [42] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993183,"logger":"pg_ctl","msg":" stopped waiting","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993183,"logger":"pg_ctl","msg":"pg_ctl: could not start server","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993455,"logger":"pg_ctl","msg":"Examine the log output.","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2994967,"msg":"Exited log pipe","fileName":"/controller/log/postgres.csv","logging_pod":"med-1"}
{"level":"error","ts":1674558343.299535,"msg":"Error while restoring a backup","logging_pod":"med-1","error":"while activating instance: error starting PostgreSQL instance: exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:127\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/log.Error\n\tpkg/management/log/log.go:165\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/cmd.go:84\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func2\n\tinternal/cmd/manager/instance/restore/cmd.go:59\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\tcmd/manager/main.go:64\nruntime.main\n\t/opt/hostedtoolcache/go/1.19.4/x64/src/runtime/proc.go:250"}

Started with the cr:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: med
  namespace: database
spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://postgresql-backup
      endpointURL: "http://minio.gitlab.svc.cluster.local:9000"
      s3Credentials:
        accessKeyId:
          key: ACCESS_KEY_ID
          name: miniobackup
        secretAccessKey:
          key: ACCESS_SECRET_KEY
          name: miniobackup
      wal:
        compression: gzip
    retentionPolicy: "3d"
  imageName: ghcr.io/cloudnative-pg/postgresql:14.6-10
  instances: 3
  storage:
    size: 5Gi
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: openebs-hostpath
      volumeMode: Filesystem
  affinity:
    nodeSelector:
      database: "true"
  bootstrap:
    recovery:
      backup:
        name: med-pgbackup
      secret:
        name: medpg-password
  monitoring:
    enablePodMonitor: true

cloudnative-pg is installed with helm (version 0.16.1) operator is at the version ghcr.io/cloudnative-pg/cloudnative-pg:1.18.1 The class storage used is openebs-hostpath (and also a test with openebs-jiva with same result)

I see a similar problem with a patch #625 and a PR merged #1164 , but seem not work in this case. Maybe that fix work only for new db (initdb)?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 28 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Hello all!!

Thanks to @chris-milsted we have found the issue, and it’s related to the fsGroup: 26 this, in some CSI drivers is adding the suid to the groups in the directories when is mounted, now that we know that, we will work in a fix.

Best Regards!

1.20.2 fixed it. thanks all

My issue looks like the chmod is not being called and we are just hitting the error. This is actually just setting up a new database and the job throws this error and nothing more

$ kubectl logs jobs/standalone-db-1-initdb |grep 750
Found 7 pods, using pod/standalone-db-1-initdb-bk9lw
Defaulted container "initdb" out of: initdb, bootstrap-controller (init)
{"level":"info","ts":"2023-06-27T10:23:47Z","logger":"pg_ctl","msg":"2023-06-27 10:23:47.206 UTC [27] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).","pipe":"stdout","logging_pod":"standalone-db-1-initdb"}

The fix is ready to be tested if someone want to try it out you can use the following operator image ghcr.io/cloudnative-pg/cloudnative-pg-testing:dev-1354 within your operator deployment, just change the image for this one

Help needed here to test!

Best Regards everyone!