prometheus-operator: Permission denied using volumeClaimTemplate w/ automatically provisioned storage

What did you do?

I ran the latest versions of the Prometheus Operator (v0.11.0 and v0.11.1) configured to use the new Prometheus v2.0.0-beta.0 version with persistent storage on the Prometheus pods using the following storage config:

...
  storage:
    volumeClaimTemplate:
      metadata:
        annotations:
          volume.beta.kubernetes.io/storage-class: ssd
      spec:
        resources:
          requests:
            storage: 10Gi
...

Note: ssd is a StorageClass for AWS EBS gp2 volumes.

What did you expect to see?

Pods on the Prometheus StatefulSet launching correctly.

What did you see instead? Under which circumstances?

The prometheus-k8s-0 pod fails to start due to a permissions issue on the persistent volume and ends up in a CrashLoopBackOff state. Inspection on the node reveals the mount point of the persistent volume created by the Prometheus Operator is owned by root, which is not the case for mount points of persistent volumes on a regular StatefulSet using the same StorageClass.

The pods launch correctly when the volumeClaimTemplate configuration is omitted.

This issue seems similar to #518, although in this case storage is being provisioned automatically.

Environment

Kubernetes version information:

Tested on both 1.6.2 and 1.6.4.

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4+coreos.0", GitCommit:"8996efde382d88f0baef1f015ae801488fcad8c4", GitTreeState:"clean", BuildDate:"2017-05-19T21:11:20Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

Custom terraform deploy on AWS using CoreOS 1409.7.0

Manifests:

see contrib/kube-prometheus/manifests

Prometheus Operator Logs:

time="2017-08-04T16:56:37Z" level=info msg="Starting prometheus (version=2.0.0-beta.0, branch=master, revision=2b5d9159537cbd123219296121e05244e26c0940)" source="main.go:202" 
time="2017-08-04T16:56:37Z" level=info msg="Build context (go=go1.8.3, user=root@fc24486243df, date=20170712-12:21:13)" source="main.go:203" 
time="2017-08-04T16:56:37Z" level=info msg="Host details (Linux 4.11.11-coreos #1 SMP Tue Jul 18 23:06:59 UTC 2017 x86_64 prometheus-k8s-0 (none))" source="main.go:204" 
time="2017-08-04T16:56:37Z" level=info msg="Starting tsdb" source="main.go:216" 
time="2017-08-04T16:56:37Z" level=error msg="Opening storage failed: open DB in /var/prometheus/data: open /var/prometheus/data/969552713: permission denied" source="main.go:219"

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 3
Comments: 43 (24 by maintainers)

Commits related to this issue

core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago
core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago
core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago

Most upvoted comments

This might have to do with the docker image being built with user nobody. We should anyways, but I think this might be fixed by setting the correct securityContext:

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

To try this out you could take a copy of the StatefulSet generated and set the securityContext in the PodTemplate. If I understand the documentation correctly, I think we should be able to get it working correctly by setting the fsGroup, runAsUser and runAsNonRoot. I don’t have a cluster at hand with auto PV provisioning, but I’d first try out these settings:

fsGroup: 2000
runAsUser: 1000
runAsNonRoot: true

+25

brancz on Aug 7, 2017

I just diffed your yaml against the one used by kube-prometheus:

- securitycontext:
+ securityContext:

The Context needs a capital C, then it works.

metalmatze on Jan 7, 2019

I also got this issue. With adding securityContext, it works. But what is the real root cause? why not get it done from source? This issue is not a new one. It appears again and agin. Why this?

nobody4t on Aug 15, 2019

Getting below error: - prometheus --config.file /etc/prometheus/prometheus.yml level=info ts=2019-02-21T06:55:02.276634178Z caller=main.go:302 msg=“Starting Prometheus” version=“(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)” level=info ts=2019-02-21T06:55:02.276773393Z caller=main.go:303 build_context=“(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:16:59)” level=info ts=2019-02-21T06:55:02.276836343Z caller=main.go:304 host_details=“(Linux 4.14.88-88.76.amzn2.x86_64 #1 SMP Mon Jan 7 18:43:26 UTC 2019 x86_64 (none))” level=info ts=2019-02-21T06:55:02.276909382Z caller=main.go:305 fd_limits=“(soft=1024, hard=4096)” level=info ts=2019-02-21T06:55:02.276962485Z caller=main.go:306 vm_limits=“(soft=unlimited, hard=unlimited)” level=info ts=2019-02-21T06:55:02.278870308Z caller=main.go:620 msg=“Starting TSDB …” level=info ts=2019-02-21T06:55:02.278963041Z caller=main.go:489 msg=“Stopping scrape discovery manager…” level=info ts=2019-02-21T06:55:02.278977179Z caller=main.go:503 msg=“Stopping notify discovery manager…” level=info ts=2019-02-21T06:55:02.278991752Z caller=main.go:525 msg=“Stopping scrape manager…” level=info ts=2019-02-21T06:55:02.279001952Z caller=main.go:499 msg=“Notify discovery manager stopped” level=info ts=2019-02-21T06:55:02.279029551Z caller=web.go:416 component=web msg=“Start listening for connections” address=0.0.0.0:9090 level=info ts=2019-02-21T06:55:02.280004348Z caller=main.go:485 msg=“Scrape discovery manager stopped” level=info ts=2019-02-21T06:55:02.280342642Z caller=manager.go:736 component=“rule manager” msg=“Stopping rule manager…” level=info ts=2019-02-21T06:55:02.280409843Z caller=manager.go:742 component=“rule manager” msg=“Rule manager stopped” level=info ts=2019-02-21T06:55:02.280471843Z caller=notifier.go:521 component=notifier msg=“Stopping notification manager…” level=info ts=2019-02-21T06:55:02.280532375Z caller=main.go:679 msg=“Notifier manager stopped” level=info ts=2019-02-21T06:55:02.280587559Z caller=main.go:519 msg=“Scrape manager stopped” level=error ts=2019-02-21T06:55:02.280756928Z caller=main.go:688 err=“opening storage failed: mkdir data/: permission denied”

I am not running docker/k8s. This is a basic installation on Amazon linux. Need help ASAP.

JigarS91 on Feb 21, 2019

That’s a silly mistake. Thank you for figuring it out.

ArjonBu on Jan 7, 2019

No worries. I was just wondering why it works on my machine and not yours

metalmatze on Jan 7, 2019