moby: Kernel not freeing memory cgroup causing no space left on device

I’m seeing errors relating to cgroup running out of disk space. When starting containers, I get this error:

"oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys/fs/cgroup/memory/docker/406cfca0c0a597091854c256a3bb2f09261ecbf86e98805414752150b11eb13a: no space left on device""

The servers have plenty of disk space and inodes. The containers cgroup is read-only, so no-one should be filling that area of the disk.

Do cgroup limits exist? If so, what are they?

UPDATE:

$ docker info
Containers: 101
 Running: 60
 Paused: 0
 Stopped: 41
Images: 73
Server Version: 1.12.3
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.6.0-040600-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 251.8 GiB
Name: sd-87633
ID: YDD7:FC5T:DCP3:ZDZO:UWP4:ZR5V:SENB:GK6N:NJGF:FB3J:T5G4:OJPZ
Docker Root Dir: /home/docker/data
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

$ uname -a
Linux sd-87633 4.6.0-040600-generic #201606100558 SMP Fri Jun 10 10:01:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 22:01:48 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 22:01:48 2016
 OS/Arch:      linux/amd64

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 24
  • Comments: 83 (43 by maintainers)

Commits related to this issue

Most upvoted comments

OK, a better workaround (than to reboot an affected node) is to do

echo 3 > /proc/sys/vm/drop_caches

periodically, e.g. from cron:

6 */12 * * * root echo 3 > /proc/sys/vm/drop_caches

I’ll be looking for something better, but it’s a kernel bug and there’s not much that we can do :-\

We are hitting the same bug. Is this issue resolved? As I understand, for now, the workaround is to keep restarting the node when we hit it, but its not feasible.

@martinlevesque kernel keeps track of page cache entries used by processes inside a container (belonging to a particular memory cgroup). Cgroup cease to exist not when there are zero processes in it, but rather when its usage is zero (i.e. then all memory is freed). Due to a shared nature of page cache, and the way the current kernels work, some of the page cache entries might still be charged to a particular memory cgroup when a container exits, leading “usage counters” greater than zero and a “dangling” cgroup as a result.

Using drop_caches, we ask the kernel to shrink the page cache, forcing the entries to be removed. It might negatively affect the overall performance short term (as some blocks on disk might need to be re-read later once needed, rather than taken from the page cache), but the result is less entries in the page cache, thus a chance for those “dangling” cgroups to decrease usage counters to zero and thus to be released.

You might use drop_caches once the number of cgroups is dangerously close to the limit, or periodically from cron, or every N container starts – and yes, this is a dirty hack, not a good solution (and yet it is much better than restart).

The solution is to use the kernel with the above mentioned patches (or backporting those patches to the kernel you use).

Other possible workaround might be to not enable kernel memory accounting for all containers (ie reverting https://github.com/opencontainers/runc/pull/1350).

It might also be possible to use drop_caches right from runc itself (which is ugly ugly hack but might be marginally better than using drop_caches from cron).

we come across the same issue on docker 17.09, kernel 3.10.0. we have tried to clear the memory: echo 1 > /proc/sys/vm/drop_caches, but not work. finally, we restart the server and then solve the problem. However, we cannot restart the server every time. so please, if anyone have any other solution without restarting the system?

After running Docker smoothly for over 2 years I’m getting this issue as well. Worst, I restarted and I still have 6 containers (over 45) that are not starting for this reason!

Humm

I run this setup for over a year now. The 6 containers that are not starting are all caddy 0.10.14 containers. I have other caddy’s container that runs normally.

All the commands I ran

uname -a; echo; echo;
docker info; echo; echo;
docker version; echo; echo;
docker ps -a | wc -l; echo; echo;
ls -l -F /sys/fs/cgroup/memory/docker/ | grep / | wc -l; echo; echo;
mount | wc -l; echo; echo;
cat /proc/cgroups | grep memory; echo; echo;
cat /proc/self/mountinfo | wc -l; echo; echo;
ls -1 /sys/fs/cgroup/cpuset/docker | wc -l; echo; echo;
find /sys/fs/cgroup/memory -type d ! -path '/sys/fs/cgroup/memory/docker*' | wc -l

Results

root@my-vps:~/deploy-setup# uname -a; echo; echo;
Linux my-vps 4.10.0-24-generic #28-Ubuntu SMP Wed Jun 14 08:14:34 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@my-vps:~/deploy-setup# docker info; echo; echo;

Containers: 45
 Running: 44
 Paused: 0
 Stopped: 1
Images: 49
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: fmbi1a5nn9sp5o4qy3eyazeq5
 Is Manager: true
 ClusterID: lzc3rrzjgu41053qywhym8jdg
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 123.123.123.23
 Manager Addresses:
  123.123.123.23:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-24-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.782GiB
Name: my-vps
ID: X5WW:PFNN:WZU7:OMCH:EXFN:N6TL:KMS4:GEHQ:WJLZ:J7DS:IHWX:I5JZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: devmtl
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

root@my-vps:~/deploy-setup# docker version; echo; echo;
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:56 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:21 2018
  OS/Arch:          linux/amd64
  Experimental:     false

root@my-vps:~/deploy-setup# docker ps -a | wc -l; echo; echo;
46

root@my-vps:~/deploy-setup# ls -l -F /sys/fs/cgroup/memory/docker/ | grep / | wc -l; echo; echo;
44

root@my-vps:~/deploy-setup# mount | wc -l; echo; echo;
172

root@my-vps:~/deploy-setup# cat /proc/cgroups | grep memory; echo; echo;
memory	2	278	1

root@my-vps:~/deploy-setup# cat /proc/self/mountinfo | wc -l; echo; echo;
172

root@my-vps:~/deploy-setup# ls -1 /sys/fs/cgroup/cpuset/docker | wc -l; echo; echo;
61

root@my-vps:~/deploy-setup# find /sys/fs/cgroup/memory -type d ! -path '/sys/fs/cgroup/memory/docker*' | wc -l
201

There are no guarantees, but I’ve tested this for over a half year, and have tried 3.10, 4.4, 4.14, 4.18, 5.0 version of Linux. It seems that updating kernel to 4.18 or a higher version will fix this issue.

I tested this by creating a deployment that keeps running containers that request more memory than they are limited, therefore they will be OOMKilled once they are created. On 4.14 or a lower version of Linux, this will ultimately cause docker or even ps command hangs. I can only reboot the whole node to recover from that hanging.

This never happened again after I upgrade Linux to 4.18.

About how ps can hang forever, here is an article.

yes, it’s exactly kernel 3.10’s bug. I tested in CentOS 7.4(3.10.0-693.11.1.el7)the latest official stable kernel has the same problem. There are many legacy system depends on CentOS 7.x kernel in my production environment, so we can’t upgrade to 4.x kernel .

We are hitting the same error and found the reason by many testing.The root cause is kernel cgroup bug.(My operation system is CentOS 7.3 with 3.10.0-514.10.2.el7.x86_64 kernel version)

If you create container with enabled cgroup kernel memory option, then will hit this bug. When you delete the container, you maybe see the cgroup memory number decrease as expected. But if you tested carefully for a long time. you will find the cgroup kernel memory space actully leaked not release. I wrote a test case to reproduce this issue related to: https://github.com/kubernetes/kubernetes/issues/61937

Tested by docker can also reproduce :

# docker run -d --name test001 --kernel-memory 100M sshdserver:v1 
WARNING: You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.

We also found k8s 1.9 auto enable this cgroup kernel memory feature by default and k8s 1.6 disable by default. So if you run k8s 1.9 in CentOS 7.x , you must change the code to disable this option.

More detail can see: http://www.linuxfly.org/kubernetes-19-conflict-with-centos7/ (written in Chinese)

I noticed that this issue occures faster when i create a faulty service (that always fails when starting) and than set restart to always - wait about 4 days and this issue pops up. Normally it takes about a month until a node starts throwing those errors.

So maybe it’s related to that?

@kolyshkin Thanks for your valuable feedback. I have a kubernetes cluster running this issue for a year, and have to reboot each node after 5~7 days of uptime. You can reach me if you wish any kind of information.

Regards,

Same issue here, Ubuntu 18.04, docker 17.03, kernel 4.15.0-1025-aws

The memory cgroup just grows and grows until a reboot is required to launch new containers. In my case, once near the limit I quickly reach 100% cpu utilization and the server becomes unresponsive.

Containers: 66
 Running: 64
 Paused: 2
 Stopped: 0
Images: 32
Server Version: 17.03.3-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6c463891b1ad274d505ae3bb738e530d1df2b3c7
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-1025-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.1 GiB
Name: ip-172-31-17-38
ID: 5Y2E:3AWK:6JVH:Y3K2:CQDR:JN34:V6SR:VZXF:2WWO:R7F2:3GEY:6ECH
Docker Root Dir: /var/lib/evaldocker
Debug Mode (client): false
Debug Mode (server): false
Username: toniceval
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@mlaventure writes:

@BenHall Every directory under the mount point (including the mount point) is considered to be a cgroup. So to get the actual number of cgroups from the FS you would have to run: find /sys/fs/cgroup/memory -type d | wc -l and that should match the number found in /proc/cgroups

It turns out that this is not always the case. I corresponded with a Linux cgroups maintainer (Michal Hocko) recently, who said:

Please note that memcgs are completely removed after the last memory accounted to them disappears. And that happens lazily on the memory pressure. So it is quite possible that this happens much later than the actual rmdir on the memcg.

So, it’s not uncommon for the num_cgroups value in /proc/cgroups to differ from what you might see in lscgroup.

@qkboy may be worth opening a ticket with Red Hat to backport the fix to their 3.10.x kernel

There is indeed a kernel memory leak up to 4.0 kernel release. You can follow this link for details: https://github.com/moby/moby/issues/6479#issuecomment-97503551

Nothing stand out in the mountinfo output unfortunately. This may be an issue with that version of the kernel, but I haven’t found any reference to a similar issue as of now. I’m having a look at the cgroup_rmdir code just in case.

OK, I’ve not some news good and bad.

Good news is the problem is supposedly fixed in v5.3-rc1 kernel (see patches from Roman Gushchin, on top of patches by Vladimir Davydov). For overall description, see https://lwn.net/Articles/790384/.

Bad news is I don’t know what Docker Engine can do to work around the problem in earlier kernels. One approach would be to revert https://github.com/opencontainers/runc/pull/1350 (i.e. not enable kmem acct by default), but I doubt that would be accepted.

Looking for other alternatives…

About the memory cgroups leaking; reading this comment: https://github.com/moby/moby/issues/24559#issuecomment-232436302

Also tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=124641 The fix is also going to backport to 4.4. https://lkml.org/lkml/2016/7/13/864

Thanks, restarting helped us too, resets the number and allows containers to start again.

@BenHall what else do you have under /sys/fs/cgroup/memory/ excluding the docker dir? (something like find /sys/fs/cgroup/memory -type d ! -path '/sys/fs/cgroup/memory/docker*' should work).

You can also check that you don’t have the memory cgroup mounted somewhere else by checking the output of mount or cat /proc/self/mountinfo