prometheus-operator: Permission denied writing to mount using volumeClaimTemplate
What did you do?
I ran the latest versions of the Prometheus Operator and Kube-Prometheus helm charts configured to use persistent storage on the Prometheus pods using the following storage config:
storageSpec:
volumeClaimTemplate:
spec:
selector:
matchLabels:
app: k8s-prometheus
resources:
requests:
storage: 20Gi
What did you expect to see?
Volume mount used for storing persistent data.
What did you see instead? Under which circumstances?
The prometheus-kube-prometheus-0 pod keeps crashing with the following error of permission denied on the mounted volume. If I change the configuration to not use the volumeClaimTemplate it works fine. I have also tried using prometheus 2.1 image instead of the default 2.0 used in the helm chart.
This issue looks to be exactly the same as #541 but that looked to have been resolved by setting a securityContext. Inspecting the stateful json set I can see that
"securityContext": {
"runAsUser": 1000,
"runAsNonRoot": true,
"fsGroup": 2000
},
is already set. So suspect this problem has not been fully fixed
Environment
- Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T20:00:41Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-4+9c2a4c1ed1ee7e", GitCommit:"9c2a4c1ed1ee7e2e121203aa9a87315633a89eca", GitTreeState:"clean", BuildDate:"2018-01-22T08:23:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
This cluster is running on the IBM Cloud
- Manifests:
public helm repo used i.e. helm upgrade --install --recreate-pods kube-prometheus coreos/kube-prometheus --namespace monitoring -f kube-prometheus-values.yaml
- Prometheus Operator Logs:
level=info ts=2018-02-06T20:33:54.685030752Z caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, branch=HEAD, revision=85f23d82a045d103ea7f3c89a91fba4a93e6367a)"
level=info ts=2018-02-06T20:33:54.685207496Z caller=main.go:226 build_context="(go=go1.9.2, user=root@6e784304d3ff, date=20180119-12:01:23)"
level=info ts=2018-02-06T20:33:54.685299129Z caller=main.go:227 host_details="(Linux 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 prometheus-kube-prometheus-0 (none))"
level=info ts=2018-02-06T20:33:54.685377696Z caller=main.go:228 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-02-06T20:33:54.693871782Z caller=main.go:499 msg="Starting TSDB ..."
level=info ts=2018-02-06T20:33:54.694022982Z caller=web.go:383 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-02-06T20:33:54.700144448Z caller=main.go:386 msg="Stopping scrape discovery manager..."
level=info ts=2018-02-06T20:33:54.700176485Z caller=main.go:400 msg="Stopping notify discovery manager..."
level=info ts=2018-02-06T20:33:54.700193323Z caller=main.go:424 msg="Stopping scrape manager..."
level=info ts=2018-02-06T20:33:54.700211266Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-02-06T20:33:54.700237176Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-02-06T20:33:54.700253366Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
level=info ts=2018-02-06T20:33:54.700274951Z caller=main.go:382 msg="Scrape discovery manager stopped"
level=info ts=2018-02-06T20:33:54.700297847Z caller=main.go:396 msg="Notify discovery manager stopped"
level=info ts=2018-02-06T20:33:54.700361658Z caller=manager.go:59 component="scrape manager" msg="Starting scrape manager..."
level=info ts=2018-02-06T20:33:54.700386478Z caller=main.go:418 msg="Scrape manager stopped"
level=info ts=2018-02-06T20:33:54.700413177Z caller=main.go:570 msg="Notifier manager stopped"
level=error ts=2018-02-06T20:33:54.700427801Z caller=main.go:579 err="Opening storage failed mkdir /var/prometheus/data/wal: permission denied"
level=info ts=2018-02-06T20:33:54.700465658Z caller=main.go:581 msg="See you next time!"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 53 (21 by maintainers)
Some further investigations has highlighted what the problem is here. It looks like that by default the mounts of this type are mounted with the user/group permissions of
drwxr-xr-x 4 nobody 42949672 4096 Feb 12 10:41 data
I suspect this due to https://github.com/coreos/prometheus-operator/blob/master/pkg/prometheus/statefulset.go#L365 which is setting prometheus to run as userid 1000, guid 2000. A user with these permissions are not allowed to write to the directory with the permissions as shown above. The fix is to update the ownership of the /var/prometheus/data directory on startup to match the one the program is being run as. This has already been done in the offical prometheus helm charts - https://github.com/kubernetes/charts/commit/7d5a3ff4b105c695f332b2a8ff360e891477e6e9#diff-97df733ade0fb9ea384f77bf3a393a0a
i.e. the statefulset needs to have
@paskal What does this mean? I don’t really understand. Is that prometheus-operator not developed anymore because of an alternative? As I understand, helm/chart doesn’t provide any software and/or custom crd…
It’s been quite some time since the chart was moved. https://github.com/coreos/prometheus-operator/blob/master/helm/README.md
@icy please see here: https://github.com/coreos/prometheus-operator#prometheus-operator-vs-kube-prometheus-vs-community-helm-chart
@paskal, I don’t think the chart change fix this problem. initChownData is in prometheus chart, not prometheus-operator. prometheus-operator runs prometheus without prometheus chart. initContainer config can be added by prometheus-operator’s sourcecode.
as the above, changing securityContext solve this problem. but why we have to change securityContext? we can solve this problem simply with supporting initChownData.
For anyone who comes across this issue while looking for why your PersistantVolumeClaim isn’t mounting (as I did), if you’ve recently updated to v0.26.0 of the Prometheus operator, see this issue: alertmanager and prometheus-k8s breaks podsecuritypolicy in 0.26