moby: Kernel not freeing memory cgroup causing no space left on device
I’m seeing errors relating to cgroup running out of disk space. When starting containers, I get this error:
"oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys/fs/cgroup/memory/docker/406cfca0c0a597091854c256a3bb2f09261ecbf86e98805414752150b11eb13a: no space left on device""
The servers have plenty of disk space and inodes. The containers cgroup is read-only, so no-one should be filling that area of the disk.
Do cgroup limits exist? If so, what are they?
UPDATE:
$ docker info
Containers: 101
Running: 60
Paused: 0
Stopped: 41
Images: 73
Server Version: 1.12.3
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.6.0-040600-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 251.8 GiB
Name: sd-87633
ID: YDD7:FC5T:DCP3:ZDZO:UWP4:ZR5V:SENB:GK6N:NJGF:FB3J:T5G4:OJPZ
Docker Root Dir: /home/docker/data
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
$ uname -a
Linux sd-87633 4.6.0-040600-generic #201606100558 SMP Fri Jun 10 10:01:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 22:01:48 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 22:01:48 2016
OS/Arch: linux/amd64
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 24
- Comments: 83 (43 by maintainers)
Links to this issue
Commits related to this issue
- vendor/runc: Optionally disable kmem limits See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://github.com/opencontainers/runc/pull/1350 See: https://github.com/moby/moby/issues/2... — committed to scality/kubernetes by NicolasT 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://gi... — committed to ryarnyah/runc by ryarnyah 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://gi... — committed to ryarnyah/runc by ryarnyah 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://gi... — committed to ryarnyah/runc by ryarnyah 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://gi... — committed to ryarnyah/runc by ryarnyah 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See: https://gi... — committed to ryarnyah/runc by ryarnyah 6 years ago
- Optionally disable kmem accounting See: https://github.com/scality/kubernetes/commit/b04b0506e49b1b60ea7c8b74a5ca2edbf341cd6c See: https://github.com/kubernetes/kubernetes/issues/61937 See... — committed to ryarnyah/runc by ryarnyah 6 years ago
- libcontainer/cgroups: do not enable kmem on broken kernels Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer even if kmem limit is not configure... — committed to kolyshkin/runc by kolyshkin 6 years ago
- libcontainer: enable to compile without kmem Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel ... — committed to kolyshkin/runc by kolyshkin 6 years ago
- libcontainer/cgroups: do not enable kmem on broken kernels Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer even if kmem limit is not configure... — committed to kolyshkin/runc by kolyshkin 6 years ago
- libcontainer: ability to compile without kmem Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel... — committed to kolyshkin/runc by kolyshkin 6 years ago
- libcontainer: ability to compile without kmem Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel... — committed to thaJeztah/runc by kolyshkin 6 years ago
- libcontainer: ability to compile without kmem Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel... — committed to clnperez/runc by kolyshkin 6 years ago
- kernel: disable `CONFIG_MEMCG_KMEM` This causes kernel memory leaks when using versions of `runc` that unconditionally enable per-cgroup kernel memory resource accounting, leading to systems becoming... — committed to scality/centos-kernel by NicolasT 5 years ago
- libcontainer: ability to compile without kmem Commit fe898e7862f94 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel... — committed to caruccio/runc by kolyshkin 6 years ago
OK, a better workaround (than to reboot an affected node) is to do
echo 3 > /proc/sys/vm/drop_caches
periodically, e.g. from cron:
6 */12 * * * root echo 3 > /proc/sys/vm/drop_caches
I’ll be looking for something better, but it’s a kernel bug and there’s not much that we can do :-\
We are hitting the same bug. Is this issue resolved? As I understand, for now, the workaround is to keep restarting the node when we hit it, but its not feasible.
@martinlevesque kernel keeps track of page cache entries used by processes inside a container (belonging to a particular memory cgroup). Cgroup cease to exist not when there are zero processes in it, but rather when its usage is zero (i.e. then all memory is freed). Due to a shared nature of page cache, and the way the current kernels work, some of the page cache entries might still be charged to a particular memory cgroup when a container exits, leading “usage counters” greater than zero and a “dangling” cgroup as a result.
Using
drop_caches
, we ask the kernel to shrink the page cache, forcing the entries to be removed. It might negatively affect the overall performance short term (as some blocks on disk might need to be re-read later once needed, rather than taken from the page cache), but the result is less entries in the page cache, thus a chance for those “dangling” cgroups to decrease usage counters to zero and thus to be released.You might use
drop_caches
once the number of cgroups is dangerously close to the limit, or periodically from cron, or every N container starts – and yes, this is a dirty hack, not a good solution (and yet it is much better than restart).The solution is to use the kernel with the above mentioned patches (or backporting those patches to the kernel you use).
Other possible workaround might be to not enable kernel memory accounting for all containers (ie reverting https://github.com/opencontainers/runc/pull/1350).
It might also be possible to use
drop_caches
right from runc itself (which is ugly ugly hack but might be marginally better than usingdrop_caches
from cron).we come across the same issue on docker 17.09, kernel 3.10.0. we have tried to clear the memory: echo 1 > /proc/sys/vm/drop_caches, but not work. finally, we restart the server and then solve the problem. However, we cannot restart the server every time. so please, if anyone have any other solution without restarting the system?
After running Docker smoothly for over 2 years I’m getting this issue as well. Worst, I restarted and I still have 6 containers (over 45) that are not starting for this reason!
Humm
I run this setup for over a year now. The 6 containers that are not starting are all caddy 0.10.14 containers. I have other caddy’s container that runs normally.
All the commands I ran
Results
There are no guarantees, but I’ve tested this for over a half year, and have tried
3.10
,4.4
,4.14
,4.18
,5.0
version of Linux. It seems that updating kernel to4.18
or a higher version will fix this issue.I tested this by creating a deployment that keeps running containers that request more memory than they are limited, therefore they will be OOMKilled once they are created. On
4.14
or a lower version of Linux, this will ultimately causedocker
or evenps
command hangs. I can only reboot the whole node to recover from that hanging.This never happened again after I upgrade Linux to
4.18
.About how ps can hang forever, here is an article.
yes, it’s exactly kernel 3.10’s bug. I tested in CentOS 7.4(3.10.0-693.11.1.el7)the latest official stable kernel has the same problem. There are many legacy system depends on CentOS 7.x kernel in my production environment, so we can’t upgrade to 4.x kernel .
We are hitting the same error and found the reason by many testing.The root cause is kernel cgroup bug.(My operation system is CentOS 7.3 with 3.10.0-514.10.2.el7.x86_64 kernel version)
If you create container with enabled cgroup kernel memory option, then will hit this bug. When you delete the container, you maybe see the cgroup memory number decrease as expected. But if you tested carefully for a long time. you will find the cgroup kernel memory space actully leaked not release. I wrote a test case to reproduce this issue related to: https://github.com/kubernetes/kubernetes/issues/61937
Tested by docker can also reproduce :
We also found k8s 1.9 auto enable this cgroup kernel memory feature by default and k8s 1.6 disable by default. So if you run k8s 1.9 in CentOS 7.x , you must change the code to disable this option.
More detail can see: http://www.linuxfly.org/kubernetes-19-conflict-with-centos7/ (written in Chinese)
I noticed that this issue occures faster when i create a faulty service (that always fails when starting) and than set restart to always - wait about 4 days and this issue pops up. Normally it takes about a month until a node starts throwing those errors.
So maybe it’s related to that?
@kolyshkin Thanks for your valuable feedback. I have a kubernetes cluster running this issue for a year, and have to reboot each node after 5~7 days of uptime. You can reach me if you wish any kind of information.
Regards,
Same issue here, Ubuntu 18.04, docker 17.03, kernel 4.15.0-1025-aws
The
memory
cgroup just grows and grows until a reboot is required to launch new containers. In my case, once near the limit I quickly reach 100% cpu utilization and the server becomes unresponsive.@mlaventure writes:
It turns out that this is not always the case. I corresponded with a Linux cgroups maintainer (Michal Hocko) recently, who said:
So, it’s not uncommon for the
num_cgroups
value in /proc/cgroups to differ from what you might see inlscgroup
.@qkboy may be worth opening a ticket with Red Hat to backport the fix to their 3.10.x kernel
There is indeed a kernel memory leak up to 4.0 kernel release. You can follow this link for details: https://github.com/moby/moby/issues/6479#issuecomment-97503551
Nothing stand out in the mountinfo output unfortunately. This may be an issue with that version of the kernel, but I haven’t found any reference to a similar issue as of now. I’m having a look at the
cgroup_rmdir
code just in case.OK, I’ve not some news good and bad.
Good news is the problem is supposedly fixed in v5.3-rc1 kernel (see patches from Roman Gushchin, on top of patches by Vladimir Davydov). For overall description, see https://lwn.net/Articles/790384/.
Bad news is I don’t know what Docker Engine can do to work around the problem in earlier kernels. One approach would be to revert https://github.com/opencontainers/runc/pull/1350 (i.e. not enable kmem acct by default), but I doubt that would be accepted.
Looking for other alternatives…
About the memory cgroups leaking; reading this comment: https://github.com/moby/moby/issues/24559#issuecomment-232436302
Thanks, restarting helped us too, resets the number and allows containers to start again.
@BenHall what else do you have under
/sys/fs/cgroup/memory/
excluding the docker dir? (something likefind /sys/fs/cgroup/memory -type d ! -path '/sys/fs/cgroup/memory/docker*'
should work).You can also check that you don’t have the memory cgroup mounted somewhere else by checking the output of
mount
orcat /proc/self/mountinfo