prometheus-operator: Unable to run rules-configmap-reloader in docker 18.9.2 due to 10Mi limit
What did you do? Upgraded to dokcer to 18.9.2
What did you expect to see? Pods continue running
What did you see instead? Under which circumstances? Pods in crashloop with error:
Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused "process_linux.go:390: setting cgroup config for procHooks process caused \"failed to write 10485760 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/kubepods/podcb7754b6-3117-11e9-853f-da195751b071/rules-configmap-reloader/memory.limit_in_bytes: device or resource busy\""": unknown
Environment
- Prometheus Operator version:
v0.27.0
- Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
kubeadm
- Docker version
18.9.2
This seems to be caused by a change to runc being copied into memory.
Since there are existing PRs open for increasing the memory limit, is it possible to fast-track these into a new release to address the issue for folks mitigating cve-2019-5736?
PR for increasing the limits directly: https://github.com/coreos/prometheus-operator/pull/2371
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 14
- Comments: 23 (11 by maintainers)
Commits related to this issue
- Quick hack fix for https://github.com/coreos/prometheus-operator/issues/2409 — committed to deas/prometheus-operator by deas 5 years ago
- Bump prometheus-operator version This will allow us to upgrade to k8s 1.12.10. An error was caught in testing, which was outlined in this issue: https://github.com/coreos/prometheus-operator/issues/2... — committed to ministryofjustice/cloud-platform-infrastructure by jasonBirchall 5 years ago
- Bump prometheus-operator version This will allow us to upgrade to k8s 1.12.10. An error was caught in testing, which was outlined in this issue: https://github.com/coreos/prometheus-operator/issues/2... — committed to ministryofjustice/cloud-platform-infrastructure by jasonBirchall 5 years ago
I updated the Deployment for the Prometheus Operator to use 0.29, and it seems as if that fixes the issue with the Prometheus Pods. My Alert Manager Pod is still in a CrashLoopBackOff state, but I’ll dig into that separately (not sure it is related or not).
I tried to deploy the operator using the stable helm chart (which is using v0.29), but the configmap-reloader keeps crashing with the same error:
I noticed that #2403 had increased the memory and CPU requests, but when I
describe
d the pod, the memory requests and limits for that container in both alertmanager and prometheus statefulsets are still10Mi
. Should I create a separate issue for this? Are there any changes that need to be propagated to the charts repo?