postgres-operator: postgres-startup init fail
Overview
When I create a postgres cluster, the instance crash on initialization. The bug happened only on kubernetes 1.22.
Environment
- Platform:
Kuberneteson ovh managed kubernetes service - Platform Version: 1.22.2
- PGO Image Tag: ubi8-5.0.4-0
- Postgres Version: 13.5
- Storage: csi-cinder-classic
Steps to Reproduce
REPRO
- Install postgres-operator example with customize
- Install the postgres hippo cluster defined in the example
EXPECTED
- The postgres cluster starts correctely.
ACTUAL
The postgres-startup container failed to init the cluster
Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-13/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 13.5
::postgres-operator: config directory::/pgdata/pg13
::postgres-operator: data directory::/pgdata/pg13
install: cannot change permissions of ‘/pgdata/pg13’: No such file or directory
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 40s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 38s default-scheduler Successfully assigned postgres-operator/hippo-instance1-mzs4-0 to node-b4422cc5-9dc5-4116-a4d5-a194f6171309
Normal SuccessfulAttachVolume 35s attachdetach-controller AttachVolume.Attach succeeded for volume "ovh-managed-kubernetes-uo1sfr-pvc-729b89ad-c0dd-48f9-847b-36e36d8cc8e6"
Normal Pulled 30s kubelet Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.260460237s
Normal Pulled 28s kubelet Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.232337581s
Normal Pulling 15s (x3 over 32s) kubelet Pulling image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0"
Normal Created 14s (x3 over 30s) kubelet Created container postgres-startup
Normal Started 14s (x3 over 30s) kubelet Started container postgres-startup
Normal Pulled 14s kubelet Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.226427728s
Warning BackOff 13s (x3 over 27s) kubelet Back-off restarting failed container
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 8
- Comments: 34 (8 by maintainers)
Commits related to this issue
- Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the specified PostgreSQL version. Some ... — committed to benjaminjb/postgres-operator by benjaminjb 2 years ago
- Migration assistance (#3445) * Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the s... — committed to CrunchyData/postgres-operator by benjaminjb 2 years ago
- K8SPG-437: merge upstream 5.4.2 changes (#518) * Replace HandleDeleteNamespace Test With KUTTL (#3172) TestReconcilerHandleDeleteNamespace was prone to flakes when run with `envtest-existing`, an... — committed to percona/percona-postgresql-operator by pooknull 9 months ago
I’m fairly confident this is similar to an issue that prompted #2897 and should be fixed in that.
The interim solution is to set the following in your PostgresCluster spec:
I stumbled upon this with RedHat’s Code Ready Containers when going through the examples:
The logs:
My environment:
The operator detects that it’s running on OpenShift but doesn’t seem to attach an appropriate SCC to the
hippo-instanceRole.I have this same issue on my minikube (3 nodes). It works however when i use single node installation of minikube
Hi there, I’ve got the same issue on OVH Public Cloud (postgres-startup container failed with
install: cannot change permissions of ‘/pgdata/pg14’: No such file or directory). This article (https://docs.ovh.com/sg/en/kubernetes/persistentvolumes-permission-errors/) solved my issues. I’ve recreate both storage classes with the additionalfsType: ext4parameter.TLDR: Recreate the StorageClass with
parameters.fsType: ext4Linode support responds to me saying: “I agree that this appears to be related to a change on our end. To be precise, our CSI Driver had an update that can cause permissions issues with mounting Volumes for certain deployments. We’ve seen this is especially prevalent with PostgreSQL deployments.”
They proposed a workaround - “Run the command below and then redeploy your workloads”
This worked for me and i hope it will help someone here.