prometheus-operator: Permission denied writing to mount using volumeClaimTemplate

What did you do?

I ran the latest versions of the Prometheus Operator and Kube-Prometheus helm charts configured to use persistent storage on the Prometheus pods using the following storage config:

  storageSpec: 
    volumeClaimTemplate:
      spec:
        selector:
          matchLabels:
            app: k8s-prometheus
        resources:
          requests:
            storage: 20Gi

What did you expect to see?

Volume mount used for storing persistent data.

What did you see instead? Under which circumstances?

The prometheus-kube-prometheus-0 pod keeps crashing with the following error of permission denied on the mounted volume. If I change the configuration to not use the volumeClaimTemplate it works fine. I have also tried using prometheus 2.1 image instead of the default 2.0 used in the helm chart.

This issue looks to be exactly the same as #541 but that looked to have been resolved by setting a securityContext. Inspecting the stateful json set I can see that

        "securityContext": {
          "runAsUser": 1000,
          "runAsNonRoot": true,
          "fsGroup": 2000
        },

is already set. So suspect this problem has not been fully fixed

Environment

Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T20:00:41Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-4+9c2a4c1ed1ee7e", GitCommit:"9c2a4c1ed1ee7e2e121203aa9a87315633a89eca", GitTreeState:"clean", BuildDate:"2018-01-22T08:23:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

This cluster is running on the IBM Cloud

Manifests:

public helm repo used i.e. helm upgrade --install --recreate-pods kube-prometheus coreos/kube-prometheus --namespace monitoring -f kube-prometheus-values.yaml

Prometheus Operator Logs:

level=info ts=2018-02-06T20:33:54.685030752Z caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, branch=HEAD, revision=85f23d82a045d103ea7f3c89a91fba4a93e6367a)"
level=info ts=2018-02-06T20:33:54.685207496Z caller=main.go:226 build_context="(go=go1.9.2, user=root@6e784304d3ff, date=20180119-12:01:23)"
level=info ts=2018-02-06T20:33:54.685299129Z caller=main.go:227 host_details="(Linux 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 prometheus-kube-prometheus-0 (none))"
level=info ts=2018-02-06T20:33:54.685377696Z caller=main.go:228 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-02-06T20:33:54.693871782Z caller=main.go:499 msg="Starting TSDB ..."
level=info ts=2018-02-06T20:33:54.694022982Z caller=web.go:383 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-02-06T20:33:54.700144448Z caller=main.go:386 msg="Stopping scrape discovery manager..."
level=info ts=2018-02-06T20:33:54.700176485Z caller=main.go:400 msg="Stopping notify discovery manager..."
level=info ts=2018-02-06T20:33:54.700193323Z caller=main.go:424 msg="Stopping scrape manager..."
level=info ts=2018-02-06T20:33:54.700211266Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-02-06T20:33:54.700237176Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-02-06T20:33:54.700253366Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
level=info ts=2018-02-06T20:33:54.700274951Z caller=main.go:382 msg="Scrape discovery manager stopped"
level=info ts=2018-02-06T20:33:54.700297847Z caller=main.go:396 msg="Notify discovery manager stopped"
level=info ts=2018-02-06T20:33:54.700361658Z caller=manager.go:59 component="scrape manager" msg="Starting scrape manager..."
level=info ts=2018-02-06T20:33:54.700386478Z caller=main.go:418 msg="Scrape manager stopped"
level=info ts=2018-02-06T20:33:54.700413177Z caller=main.go:570 msg="Notifier manager stopped"
level=error ts=2018-02-06T20:33:54.700427801Z caller=main.go:579 err="Opening storage failed mkdir /var/prometheus/data/wal: permission denied"
level=info ts=2018-02-06T20:33:54.700465658Z caller=main.go:581 msg="See you next time!"

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 7
Comments: 53 (21 by maintainers)

Most upvoted comments

Some further investigations has highlighted what the problem is here. It looks like that by default the mounts of this type are mounted with the user/group permissions of drwxr-xr-x 4 nobody 42949672 4096 Feb 12 10:41 data

I suspect this due to https://github.com/coreos/prometheus-operator/blob/master/pkg/prometheus/statefulset.go#L365 which is setting prometheus to run as userid 1000, guid 2000. A user with these permissions are not allowed to write to the directory with the permissions as shown above. The fix is to update the ownership of the /var/prometheus/data directory on startup to match the one the program is being run as. This has already been done in the offical prometheus helm charts - https://github.com/kubernetes/charts/commit/7d5a3ff4b105c695f332b2a8ff360e891477e6e9#diff-97df733ade0fb9ea384f77bf3a393a0a

i.e. the statefulset needs to have

  initContainers:
  - name: "init-chown-data"
    image: "busybox"
    # 1000 is the user that prometheus uses.
    command: ["chown", "-R", "1000:2000", /var/prometheus/data]
    volumeMounts:
    - name: prometheus-kube-prometheus-db
      mountPath: /var/prometheus/data

IBMRob on Feb 12, 2018

@gangseok514, again, prometheus-operator here (in coreos) is deprecated and will never be changed again. prometheus-operator migrated to helm/stable and if you want to change it, you can change it only in helm/stable.

@paskal What does this mean? I don’t really understand. Is that prometheus-operator not developed anymore because of an alternative? As I understand, helm/chart doesn’t provide any software and/or custom crd…

icy on Nov 15, 2019

It’s been quite some time since the chart was moved. https://github.com/coreos/prometheus-operator/blob/master/helm/README.md

vsliouniaev on Nov 17, 2019

@icy please see here: https://github.com/coreos/prometheus-operator#prometheus-operator-vs-kube-prometheus-vs-community-helm-chart

vsliouniaev on Nov 15, 2019

@paskal, I don’t think the chart change fix this problem. initChownData is in prometheus chart, not prometheus-operator. prometheus-operator runs prometheus without prometheus chart. initContainer config can be added by prometheus-operator’s sourcecode.

prometheus:
   ... 
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000

as the above, changing securityContext solve this problem. but why we have to change securityContext? we can solve this problem simply with supporting initChownData.

gangseok514 on Dec 26, 2018

For anyone who comes across this issue while looking for why your PersistantVolumeClaim isn’t mounting (as I did), if you’ve recently updated to v0.26.0 of the Prometheus operator, see this issue: alertmanager and prometheus-k8s breaks podsecuritypolicy in 0.26

BrianChristie on Dec 5, 2018