prometheus-operator: Permission denied using volumeClaimTemplate w/ automatically provisioned storage
What did you do?
I ran the latest versions of the Prometheus Operator (v0.11.0 and v0.11.1) configured to use the new Prometheus v2.0.0-beta.0 version with persistent storage on the Prometheus pods using the following storage config:
...
storage:
volumeClaimTemplate:
metadata:
annotations:
volume.beta.kubernetes.io/storage-class: ssd
spec:
resources:
requests:
storage: 10Gi
...
Note: ssd
is a StorageClass
for AWS EBS gp2 volumes.
What did you expect to see?
Pods on the Prometheus StatefulSet launching correctly.
What did you see instead? Under which circumstances?
The prometheus-k8s-0
pod fails to start due to a permissions issue on the persistent volume and ends up in a CrashLoopBackOff
state. Inspection on the node reveals the mount point of the persistent volume created by the Prometheus Operator is owned by root
, which is not the case for mount points of persistent volumes on a regular StatefulSet using the same StorageClass
.
The pods launch correctly when the volumeClaimTemplate
configuration is omitted.
This issue seems similar to #518, although in this case storage is being provisioned automatically.
Environment
- Kubernetes version information:
Tested on both 1.6.2 and 1.6.4.
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4+coreos.0", GitCommit:"8996efde382d88f0baef1f015ae801488fcad8c4", GitTreeState:"clean", BuildDate:"2017-05-19T21:11:20Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
Custom terraform deploy on AWS using CoreOS 1409.7.0
- Manifests:
see contrib/kube-prometheus/manifests
- Prometheus Operator Logs:
time="2017-08-04T16:56:37Z" level=info msg="Starting prometheus (version=2.0.0-beta.0, branch=master, revision=2b5d9159537cbd123219296121e05244e26c0940)" source="main.go:202"
time="2017-08-04T16:56:37Z" level=info msg="Build context (go=go1.8.3, user=root@fc24486243df, date=20170712-12:21:13)" source="main.go:203"
time="2017-08-04T16:56:37Z" level=info msg="Host details (Linux 4.11.11-coreos #1 SMP Tue Jul 18 23:06:59 UTC 2017 x86_64 prometheus-k8s-0 (none))" source="main.go:204"
time="2017-08-04T16:56:37Z" level=info msg="Starting tsdb" source="main.go:216"
time="2017-08-04T16:56:37Z" level=error msg="Opening storage failed: open DB in /var/prometheus/data: open /var/prometheus/data/969552713: permission denied" source="main.go:219"
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 3
- Comments: 43 (24 by maintainers)
Commits related to this issue
- core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago
- core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago
- core:set security context on prometheus statefulset allow prometheus pods to access PV mount points on clusters using dynamic volume provisioning fixes #541 — committed to Capitrium/prometheus-operator by Capitrium 7 years ago
This might have to do with the docker image being built with user
nobody
. We should anyways, but I think this might be fixed by setting the correctsecurityContext
:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
To try this out you could take a copy of the StatefulSet generated and set the
securityContext
in the PodTemplate. If I understand the documentation correctly, I think we should be able to get it working correctly by setting thefsGroup
,runAsUser
andrunAsNonRoot
. I don’t have a cluster at hand with auto PV provisioning, but I’d first try out these settings:I just diffed your yaml against the one used by kube-prometheus:
The Context needs a capital
C
, then it works.I also got this issue. With adding securityContext, it works. But what is the real root cause? why not get it done from source? This issue is not a new one. It appears again and agin. Why this?
Getting below error: - prometheus --config.file /etc/prometheus/prometheus.yml level=info ts=2019-02-21T06:55:02.276634178Z caller=main.go:302 msg=“Starting Prometheus” version=“(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)” level=info ts=2019-02-21T06:55:02.276773393Z caller=main.go:303 build_context=“(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:16:59)” level=info ts=2019-02-21T06:55:02.276836343Z caller=main.go:304 host_details=“(Linux 4.14.88-88.76.amzn2.x86_64 #1 SMP Mon Jan 7 18:43:26 UTC 2019 x86_64 (none))” level=info ts=2019-02-21T06:55:02.276909382Z caller=main.go:305 fd_limits=“(soft=1024, hard=4096)” level=info ts=2019-02-21T06:55:02.276962485Z caller=main.go:306 vm_limits=“(soft=unlimited, hard=unlimited)” level=info ts=2019-02-21T06:55:02.278870308Z caller=main.go:620 msg=“Starting TSDB …” level=info ts=2019-02-21T06:55:02.278963041Z caller=main.go:489 msg=“Stopping scrape discovery manager…” level=info ts=2019-02-21T06:55:02.278977179Z caller=main.go:503 msg=“Stopping notify discovery manager…” level=info ts=2019-02-21T06:55:02.278991752Z caller=main.go:525 msg=“Stopping scrape manager…” level=info ts=2019-02-21T06:55:02.279001952Z caller=main.go:499 msg=“Notify discovery manager stopped” level=info ts=2019-02-21T06:55:02.279029551Z caller=web.go:416 component=web msg=“Start listening for connections” address=0.0.0.0:9090 level=info ts=2019-02-21T06:55:02.280004348Z caller=main.go:485 msg=“Scrape discovery manager stopped” level=info ts=2019-02-21T06:55:02.280342642Z caller=manager.go:736 component=“rule manager” msg=“Stopping rule manager…” level=info ts=2019-02-21T06:55:02.280409843Z caller=manager.go:742 component=“rule manager” msg=“Rule manager stopped” level=info ts=2019-02-21T06:55:02.280471843Z caller=notifier.go:521 component=notifier msg=“Stopping notification manager…” level=info ts=2019-02-21T06:55:02.280532375Z caller=main.go:679 msg=“Notifier manager stopped” level=info ts=2019-02-21T06:55:02.280587559Z caller=main.go:519 msg=“Scrape manager stopped” level=error ts=2019-02-21T06:55:02.280756928Z caller=main.go:688 err=“opening storage failed: mkdir data/: permission denied”
I am not running docker/k8s. This is a basic installation on Amazon linux. Need help ASAP.
That’s a silly mistake. Thank you for figuring it out.
No worries. I was just wondering why it works on my machine and not yours