cloudnative-pg: Restore from backup fail with wrong permissions
Hi! I get permission error on pgdata when create a cluster with the recovery option (from another existent cluster)
{"level":"info","ts":1674558339.3954432,"msg":"barman-cloud-check-wal-archive checking the first wal","logging_pod":"med-1"}
{"level":"info","ts":1674558339.733525,"msg":"Recovering existing backup","logging_pod":"med-1","backup":{"metadata":{"name":"med-pgbackup","namespace":"database","uid":"ec528b6a-4332-4169-911d-4e1d1f371c23","resourceVersion":"111945109","generation":1,"creationTimestamp":"2023-01-24T08:07:13Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"postgresql.cnpg.io/v1\",\"kind\":\"Backup\",\"metadata\":{\"annotations\":{},\"name\":\"med-pgbackup\",\"namespace\":\"database\"},\"spec\":{\"cluster\":{\"name\":\"medpg\"}}}\n"},"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"postgresql.cnpg.io/v1","time":"2023-01-24T08:07:13Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:cluster":{".":{},"f:name":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"postgresql.cnpg.io/v1","time":"2023-01-24T08:07:59Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:backupId":{},"f:beginLSN":{},"f:beginWal":{},"f:destinationPath":{},"f:endLSN":{},"f:endWal":{},"f:endpointURL":{},"f:instanceID":{".":{},"f:ContainerID":{},"f:podName":{}},"f:phase":{},"f:s3Credentials":{".":{},"f:accessKeyId":{".":{},"f:key":{},"f:name":{}},"f:inheritFromIAMRole":{},"f:secretAccessKey":{".":{},"f:key":{},"f:name":{}}},"f:serverName":{},"f:startedAt":{},"f:stoppedAt":{}}},"subresource":"status"}]},"spec":{"cluster":{"name":"medpg"}},"status":{"s3Credentials":{"accessKeyId":{"name":"miniobackup","key":"ACCESS_KEY_ID"},"secretAccessKey":{"name":"miniobackup","key":"ACCESS_SECRET_KEY"},"inheritFromIAMRole":false},"endpointURL":"http://minio.gitlab.svc.cluster.local:9000","destinationPath":"s3://postgresql-backup","serverName":"medpg","backupId":"20230124T080714","phase":"completed","startedAt":"2023-01-24T08:07:14Z","stoppedAt":"2023-01-24T08:07:22Z","beginWal":"000000040000002D00000029","endWal":"000000040000002D00000029","beginLSN":"2D/29000028","endLSN":"2D/29000138","instanceID":{"podName":"medpg-3","ContainerID":"containerd://f00fce6da6aafe9288135af641d81621d8a379a819ce63eb0c2ab8cb424ffed5"}}}}
{"level":"info","ts":1674558339.733743,"msg":"Starting barman-cloud-restore","logging_pod":"med-1","options":["--endpoint-url","http://minio.gitlab.svc.cluster.local:9000","s3://postgresql-backup","medpg","20230124T080714","--cloud-provider","aws-s3","/var/lib/postgresql/data/pgdata"]}
{"level":"info","ts":1674558342.6783388,"msg":"Restore completed","logging_pod":"med-1"}
{"level":"info","ts":1674558342.6784635,"msg":"Creating new data directory","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","initDbOptions":["--username","postgres","-D","/controller/recovery/datadir_1857835105","--no-sync"]}
{"level":"info","ts":1674558343.1420522,"logger":"initdb","msg":"The files belonging to this database system will be owned by user \"postgres\".\nThis user must also own the server process.\n\nThe database cluster will be initialized with locale \"en_US.utf8\".\nThe default database encoding has accordingly been set to \"UTF8\".\nThe default text search configuration will be set to \"english\".\n\nData page checksums are disabled.\n\nfixing permissions on existing directory /controller/recovery/datadir_1857835105 ... ok\ncreating subdirectories ... ok\nselecting dynamic shared memory implementation ... posix\nselecting default max_connections ... 100\nselecting default shared_buffers ... 128MB\nselecting default time zone ... Etc/UTC\ncreating configuration files ... ok\nrunning bootstrap script ... ok\nperforming post-bootstrap initialization ... ok\n\nSync to disk skipped.\nThe data directory might become corrupt if the operating system crashes.\n\n\nSuccess. You can now start the database server using:\n\n pg_ctl -D /controller/recovery/datadir_1857835105 -l logfile start\n\n","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.142085,"logger":"initdb","msg":"initdb: warning: enabling \"trust\" authentication for local connections\nYou can change this by editing pg_hba.conf or using the option -A, or\n--auth-local and --auth-host, the next time you run initdb.\n","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.1474771,"msg":"Installed configuration file","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","filename":"pg_hba.conf"}
{"level":"info","ts":1674558343.147517,"msg":"Ignore minSyncReplicas to enforce self-healing","logging_pod":"med-1","syncReplicas":-1,"minSyncReplicas":0,"maxSyncReplicas":0}
{"level":"info","ts":1674558343.1526413,"msg":"Installed configuration file","logging_pod":"med-1","pgdata":"/controller/recovery/datadir_1857835105","filename":"custom.conf"}
{"level":"info","ts":1674558343.1812582,"msg":"Generated recovery configuration","logging_pod":"med-1","configuration":"recovery_target_action = promote\nrestore_command = 'barman-cloud-wal-restore --endpoint-url http://minio.gitlab.svc.cluster.local:9000 s3://postgresql-backup medpg --cloud-provider aws-s3 %f %p'\n"}
{"level":"info","ts":1674558343.1885922,"msg":"enforcing parameters found in pg_controldata","logging_pod":"med-1","parameters":{"max_connections":"100","max_locks_per_transaction":"64","max_prepared_transactions":"0","max_wal_senders":"10","max_worker_processes":"32"}}
{"level":"info","ts":1674558343.1914668,"msg":"Starting up instance","logging_pod":"med-1","pgdata":"/var/lib/postgresql/data/pgdata","options":["start","-w","-D","/var/lib/postgresql/data/pgdata","-o","-c port=5432 -c unix_socket_directories=/controller/run","-t 40000000","-o","-c listen_addresses='127.0.0.1'"]}
{"level":"info","ts":1674558343.2063065,"logger":"pg_ctl","msg":"waiting for server to start....2023-01-24 11:05:43.206 UTC [42] FATAL: data directory \"/var/lib/postgresql/data/pgdata\" has invalid permissions","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2063231,"logger":"pg_ctl","msg":"2023-01-24 11:05:43.206 UTC [42] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993183,"logger":"pg_ctl","msg":" stopped waiting","pipe":"stdout","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993183,"logger":"pg_ctl","msg":"pg_ctl: could not start server","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2993455,"logger":"pg_ctl","msg":"Examine the log output.","pipe":"stderr","logging_pod":"med-1"}
{"level":"info","ts":1674558343.2994967,"msg":"Exited log pipe","fileName":"/controller/log/postgres.csv","logging_pod":"med-1"}
{"level":"error","ts":1674558343.299535,"msg":"Error while restoring a backup","logging_pod":"med-1","error":"while activating instance: error starting PostgreSQL instance: exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:127\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/log.Error\n\tpkg/management/log/log.go:165\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/cmd.go:84\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func2\n\tinternal/cmd/manager/instance/restore/cmd.go:59\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\tcmd/manager/main.go:64\nruntime.main\n\t/opt/hostedtoolcache/go/1.19.4/x64/src/runtime/proc.go:250"}
Started with the cr:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: med
namespace: database
spec:
backup:
barmanObjectStore:
destinationPath: s3://postgresql-backup
endpointURL: "http://minio.gitlab.svc.cluster.local:9000"
s3Credentials:
accessKeyId:
key: ACCESS_KEY_ID
name: miniobackup
secretAccessKey:
key: ACCESS_SECRET_KEY
name: miniobackup
wal:
compression: gzip
retentionPolicy: "3d"
imageName: ghcr.io/cloudnative-pg/postgresql:14.6-10
instances: 3
storage:
size: 5Gi
pvcTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: openebs-hostpath
volumeMode: Filesystem
affinity:
nodeSelector:
database: "true"
bootstrap:
recovery:
backup:
name: med-pgbackup
secret:
name: medpg-password
monitoring:
enablePodMonitor: true
cloudnative-pg is installed with helm (version 0.16.1) operator is at the version ghcr.io/cloudnative-pg/cloudnative-pg:1.18.1 The class storage used is openebs-hostpath (and also a test with openebs-jiva with same result)
I see a similar problem with a patch #625 and a PR merged #1164 , but seem not work in this case. Maybe that fix work only for new db (initdb)?
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 28 (9 by maintainers)
Commits related to this issue
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure PGDATA permissions after initdb In some system the CSI driver may add a suid to the directory when the `fsGroup` in the security context it's enable, making imposible to PostgreSQL to sta... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure correct PGDATA permissions for initdb and restore (#2384) Certain CSI drivers may add setgid permissions on newly created folders. A default umask is set to attempt to avoid this, by rev... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure correct PGDATA permissions for initdb and restore (#2384) Certain CSI drivers may add setgid permissions on newly created folders. A default umask is set to attempt to avoid this, by revo... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
- fix: ensure correct PGDATA permissions for initdb and restore (#2384) Certain CSI drivers may add setgid permissions on newly created folders. A default umask is set to attempt to avoid this, by revo... — committed to cloudnative-pg/cloudnative-pg by sxd a year ago
Hello all!!
Thanks to @chris-milsted we have found the issue, and it’s related to the
fsGroup: 26this, in some CSI drivers is adding the suid to the groups in the directories when is mounted, now that we know that, we will work in a fix.Best Regards!
1.20.2 fixed it. thanks all
My issue looks like the chmod is not being called and we are just hitting the error. This is actually just setting up a new database and the job throws this error and nothing more
The fix is ready to be tested if someone want to try it out you can use the following operator image ghcr.io/cloudnative-pg/cloudnative-pg-testing:dev-1354 within your operator deployment, just change the image for this one
Help needed here to test!
Best Regards everyone!