postgres-operator: securityContext.fsGroup setting lead to permission issue on restart / remount

Overview

Our postgres instances regularly stop working with following in the logs:

2022-02-08 13:36:54.275 UTC [1257059] FATAL:  data directory "/pgdata/pg13" has invalid permissions
2022-02-08 13:36:54.275 UTC [1257059] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

Indeed, after restarts / remounts, the pg13 folder has permissions rwxrwx---, even though we removed group write manually.

We suspect the issue is that securityContext.fsGroup is being set at https://github.com/CrunchyData/postgres-operator/blob/2e18aef93dd2d6dee065ad00c959dc9fabc6da79/internal/postgres/reconcile.go#L280

According to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context this causes the csi driver to add the group 26 to the volume and set permissions to current permissions OR rw-rw----, therefore always adding the unwanted g+w.

We tried setting openshift: true, which removes fsGroup from the descriptor, but this causes the postgres startup container to fail, as the secrets cannot be read anymore

Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-13/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 13.5
::postgres-operator: config directory::/pgdata/pg13
::postgres-operator: data directory::/pgdata/pg13
install: cannot open '/pgconf/tls/replication/tls.crt' for reading: Permission denied
install: cannot open '/pgconf/tls/replication/tls.key' for reading: Permission denied
install: cannot open '/pgconf/tls/replication/ca.crt' for reading: Permission denied

Environment

Please provide the following details:

Platform: k3s v1.22.5+k3s1
PGO: 5.0.4
Postgres Version: crunchy-postgres:centos8-13.5-0
Storage: Hetzner CSI Driver v1.6.0 (https://github.com/hetznercloud/csi-driver)

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 8
Comments: 26 (6 by maintainers)

Most upvoted comments

Hello, just wanted to let y’all know if you haven’t seen the attached PR, the fsGroupChangePolicy fix that was suggested and tested in this thread has been merged in to PGO and should be out in the next release. Reminder (for future people who find this) that this fix is only effective in K8s 1.20+

Hope this helps and if you’re still experiencing a problem like this, please let the community know.

benjaminjb on Jul 17, 2022

I have been using the change mentioned by @cbandy for about ~6 weeks now and it haven’t had problems since then.

@matzik12 We did not change anything manually in our clusters, but used a policy agent to automatically patch the pods created by the controller manager. Here is an example for kyverno, but since OPA Gatekeeper has support for Mutations now that should work as well.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: fix-postgres-fsgroup
spec:
  rules:
  - name: fix-postgres-fsgroup
    match:
      all:
      - resources:
          kinds:
          - Pod
          selector:
            matchExpressions:
            - key: postgres-operator.crunchydata.com/instance
              operator: Exists
    mutate:
      patchStrategicMerge:
        spec:
          securityContext:
            fsGroupChangePolicy: "OnRootMismatch"

heilerich on Jun 24, 2022