postgres-operator: postgres-startup init fail

Overview

When I create a postgres cluster, the instance crash on initialization. The bug happened only on kubernetes 1.22.

Environment

Platform: Kubernetes on ovh managed kubernetes service
Platform Version: 1.22.2
PGO Image Tag: ubi8-5.0.4-0
Postgres Version: 13.5
Storage: csi-cinder-classic

Steps to Reproduce

REPRO

Install postgres-operator example with customize
Install the postgres hippo cluster defined in the example

EXPECTED

The postgres cluster starts correctely.

ACTUAL

The postgres-startup container failed to init the cluster

Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-13/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 13.5
::postgres-operator: config directory::/pgdata/pg13
::postgres-operator: data directory::/pgdata/pg13
install: cannot change permissions of ‘/pgdata/pg13’: No such file or directory

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        40s                default-scheduler        0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               38s                default-scheduler        Successfully assigned postgres-operator/hippo-instance1-mzs4-0 to node-b4422cc5-9dc5-4116-a4d5-a194f6171309
  Normal   SuccessfulAttachVolume  35s                attachdetach-controller  AttachVolume.Attach succeeded for volume "ovh-managed-kubernetes-uo1sfr-pvc-729b89ad-c0dd-48f9-847b-36e36d8cc8e6"
  Normal   Pulled                  30s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.260460237s
  Normal   Pulled                  28s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.232337581s
  Normal   Pulling                 15s (x3 over 32s)  kubelet                  Pulling image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0"
  Normal   Created                 14s (x3 over 30s)  kubelet                  Created container postgres-startup
  Normal   Started                 14s (x3 over 30s)  kubelet                  Started container postgres-startup
  Normal   Pulled                  14s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.226427728s
  Warning  BackOff                 13s (x3 over 27s)  kubelet                  Back-off restarting failed container

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 8
Comments: 34 (8 by maintainers)

Commits related to this issue

Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the specified PostgreSQL version. Some ... — committed to benjaminjb/postgres-operator by benjaminjb 2 years ago
Migration assistance (#3445) * Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the s... — committed to CrunchyData/postgres-operator by benjaminjb 2 years ago
K8SPG-437: merge upstream 5.4.2 changes (#518) * Replace HandleDeleteNamespace Test With KUTTL (#3172) TestReconcilerHandleDeleteNamespace was prone to flakes when run with `envtest-existing`, an... — committed to percona/percona-postgresql-operator by pooknull 9 months ago

Most upvoted comments

I’m fairly confident this is similar to an issue that prompted #2897 and should be fixed in that.

The interim solution is to set the following in your PostgresCluster spec:

spec:
  openshift: false

jkatz on Dec 7, 2021

I stumbled upon this with RedHat’s Code Ready Containers when going through the examples:

$ kubectl apply -f kustomize/postgres                              
postgrescluster.postgres-operator.crunchydata.com/hippo created
error: error validating "kustomize/postgres/kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

The logs:

$ kubectl logs hippo-instance1-9f6m-0 postgres-startup 
Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-14/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 14.3
::postgres-operator: config directory::/pgdata/pg14
::postgres-operator: data directory::/pgdata/pg14
install: cannot create directory ‘/pgdata’: Permission denied

My environment:

$ oc version
Client Version: 4.10.14
Server Version: 4.10.14
Kubernetes Version: v1.23.5+b463d71

The operator detects that it’s running on OpenShift but doesn’t seem to attach an appropriate SCC to the hippo-instance Role.

razvan on Jul 5, 2022

I have this same issue on my minikube (3 nodes). It works however when i use single node installation of minikube

ac5tin on Apr 13, 2022

Hi there, I’ve got the same issue on OVH Public Cloud (postgres-startup container failed with install: cannot change permissions of ‘/pgdata/pg14’: No such file or directory). This article (https://docs.ovh.com/sg/en/kubernetes/persistentvolumes-permission-errors/) solved my issues. I’ve recreate both storage classes with the additional fsType: ext4 parameter.

TLDR: Recreate the StorageClass with parameters.fsType: ext4

stefan-kollmann-wogra on Mar 8, 2022

Linode support responds to me saying: “I agree that this appears to be related to a change on our end. To be precise, our CSI Driver had an update that can cause permissions issues with mounting Volumes for certain deployments. We’ve seen this is especially prevalent with PostgreSQL deployments.”

They proposed a workaround - “Run the command below and then redeploy your workloads”

-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--default-fstype=ext4"}]'

This worked for me and i hope it will help someone here.

Leedwing on Jan 21, 2022