prometheus-operator: Unable to run rules-configmap-reloader in docker 18.9.2 due to 10Mi limit

What did you do? Upgraded to dokcer to 18.9.2

What did you expect to see? Pods continue running

What did you see instead? Under which circumstances? Pods in crashloop with error:

Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused "process_linux.go:390: setting cgroup config for procHooks process caused \"failed to write 10485760 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/kubepods/podcb7754b6-3117-11e9-853f-da195751b071/rules-configmap-reloader/memory.limit_in_bytes: device or resource busy\""": unknown

Environment

Prometheus Operator version:

v0.27.0

Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

kubeadm

Docker version

18.9.2

This seems to be caused by a change to runc being copied into memory.

Since there are existing PRs open for increasing the memory limit, is it possible to fast-track these into a new release to address the issue for folks mitigating cve-2019-5736?

PR for increasing the limits directly: https://github.com/coreos/prometheus-operator/pull/2371

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 14
Comments: 23 (11 by maintainers)

Commits related to this issue

Quick hack fix for https://github.com/coreos/prometheus-operator/issues/2409 — committed to deas/prometheus-operator by deas 5 years ago
Bump prometheus-operator version This will allow us to upgrade to k8s 1.12.10. An error was caught in testing, which was outlined in this issue: https://github.com/coreos/prometheus-operator/issues/2... — committed to ministryofjustice/cloud-platform-infrastructure by jasonBirchall 5 years ago
Bump prometheus-operator version This will allow us to upgrade to k8s 1.12.10. An error was caught in testing, which was outlined in this issue: https://github.com/coreos/prometheus-operator/issues/2... — committed to ministryofjustice/cloud-platform-infrastructure by jasonBirchall 5 years ago

Most upvoted comments

I updated the Deployment for the Prometheus Operator to use 0.29, and it seems as if that fixes the issue with the Prometheus Pods. My Alert Manager Pod is still in a CrashLoopBackOff state, but I’ll dig into that separately (not sure it is related or not).

scottslowe on Feb 20, 2019

@scottslowe we ran into the same issue here – after upgrading to prometheus-operator 0.29, alertmanager was still in a crashloopbackoff due to the sidecar not getting its memory bump. I assume there was some sort of race condition in the upgrade, as all it took to eventually fix it was to delete the AlertManager pod, it came back with 25MB memory for the sidecar.

I tried to deploy the operator using the stable helm chart (which is using v0.29), but the configmap-reloader keeps crashing with the same error:

Error: failed to start container "rules-configmap-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:367: setting cgroup config for procHooks process caused \\\"failed to write 10485760 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/kubepods/burstable/podf0efd209-3757-11e9-a0c5-000d3ab695db/rules-configmap-reloader/memory.limit_in_bytes: device or resource busy\\\"\"": unknown

I noticed that #2403 had increased the memory and CPU requests, but when I described the pod, the memory requests and limits for that container in both alertmanager and prometheus statefulsets are still 10Mi. Should I create a separate issue for this? Are there any changes that need to be propagated to the charts repo?

wafflespeanut on Feb 23, 2019