postgres-operator: postgres-startup init fail

Overview

When I create a postgres cluster, the instance crash on initialization. The bug happened only on kubernetes 1.22.

Environment

  • Platform: Kubernetes on ovh managed kubernetes service
  • Platform Version: 1.22.2
  • PGO Image Tag: ubi8-5.0.4-0
  • Postgres Version: 13.5
  • Storage: csi-cinder-classic

Steps to Reproduce

REPRO

  1. Install postgres-operator example with customize
  2. Install the postgres hippo cluster defined in the example

EXPECTED

  1. The postgres cluster starts correctely.

ACTUAL

The postgres-startup container failed to init the cluster

Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-13/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 13.5
::postgres-operator: config directory::/pgdata/pg13
::postgres-operator: data directory::/pgdata/pg13
install: cannot change permissions of ‘/pgdata/pg13’: No such file or directory
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        40s                default-scheduler        0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               38s                default-scheduler        Successfully assigned postgres-operator/hippo-instance1-mzs4-0 to node-b4422cc5-9dc5-4116-a4d5-a194f6171309
  Normal   SuccessfulAttachVolume  35s                attachdetach-controller  AttachVolume.Attach succeeded for volume "ovh-managed-kubernetes-uo1sfr-pvc-729b89ad-c0dd-48f9-847b-36e36d8cc8e6"
  Normal   Pulled                  30s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.260460237s
  Normal   Pulled                  28s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.232337581s
  Normal   Pulling                 15s (x3 over 32s)  kubelet                  Pulling image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0"
  Normal   Created                 14s (x3 over 30s)  kubelet                  Created container postgres-startup
  Normal   Started                 14s (x3 over 30s)  kubelet                  Started container postgres-startup
  Normal   Pulled                  14s                kubelet                  Successfully pulled image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.5-0" in 1.226427728s
  Warning  BackOff                 13s (x3 over 27s)  kubelet                  Back-off restarting failed container

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 8
  • Comments: 34 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I’m fairly confident this is similar to an issue that prompted #2897 and should be fixed in that.

The interim solution is to set the following in your PostgresCluster spec:

spec:
  openshift: false

I stumbled upon this with RedHat’s Code Ready Containers when going through the examples:

$ kubectl apply -f kustomize/postgres                              
postgrescluster.postgres-operator.crunchydata.com/hippo created
error: error validating "kustomize/postgres/kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

The logs:

$ kubectl logs hippo-instance1-9f6m-0 postgres-startup 
Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-14/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 14.3
::postgres-operator: config directory::/pgdata/pg14
::postgres-operator: data directory::/pgdata/pg14
install: cannot create directory ‘/pgdata’: Permission denied

My environment:

$ oc version
Client Version: 4.10.14
Server Version: 4.10.14
Kubernetes Version: v1.23.5+b463d71

The operator detects that it’s running on OpenShift but doesn’t seem to attach an appropriate SCC to the hippo-instance Role.

I have this same issue on my minikube (3 nodes). It works however when i use single node installation of minikube

Hi there, I’ve got the same issue on OVH Public Cloud (postgres-startup container failed with install: cannot change permissions of ‘/pgdata/pg14’: No such file or directory). This article (https://docs.ovh.com/sg/en/kubernetes/persistentvolumes-permission-errors/) solved my issues. I’ve recreate both storage classes with the additional fsType: ext4 parameter.

TLDR: Recreate the StorageClass with parameters.fsType: ext4

Linode support responds to me saying: “I agree that this appears to be related to a change on our end. To be precise, our CSI Driver had an update that can cause permissions issues with mounting Volumes for certain deployments. We’ve seen this is especially prevalent with PostgreSQL deployments.

They proposed a workaround - “Run the command below and then redeploy your workloads

-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--default-fstype=ext4"}]'

This worked for me and i hope it will help someone here.