runtime: Process keeps getting killed for violating docker cgroup mem constraint with server GC

When running the small program below with the environment variable COMPlus_gcServer=1 and a memory constraint on 90M, it keeps getting killed by the kernel because it is violating the cgroup memory constraint. When running with COMPlus_gcServer=0, it works fine. It looks to me like the server GC tries to match the cgroup memory limit, but misses by a few pages (unmanaged or kernel memory?) for some reason.

        static void Main(string[] args)
        {
            int num = 1000;
            var list = new List<object>(num);
            for (int i = 0; ; i++)
            {
                var length = 80 * 1024;
                var buffer = new byte[length];
                using (var stream = new FileStream("/tmp/test", FileMode.CreateNew))
                {
                }
                File.Delete("/tmp/test");

                if (list.Count < num)
                     list.Add(buffer);
                else
                     list[num - 1] = buffer;
            }
        }

If I do not include the FileStream creation, the program is not killed.

The kernel log is listed below.

dotnet invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=0
dotnet cpuset=033312b058a18b97fb43a0f7c76cc16c4a1233023fc12d20b13cf62896ab3422 mems_allowed=0
CPU: 3 PID: 19800 Comm: dotnet Tainted: G           O    4.4.104-boot2docker dotnet/coreclr#1
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 0000000000000001 ffffffff812d68a4 ffff880282410c00 ffffffff817abaf5
 ffffffff81152675 0000000000000086 0000000000000002 0000000000016700
 0000000000000046 ffff880282410c00 ffffffff817abaf5 ffff8800db277e38
Call Trace:
 [<ffffffff812d68a4>] ? dump_stack+0x5a/0x6f
 [<ffffffff81152675>] ? dump_header+0x5c/0x1b9
 [<ffffffff8110ba48>] ? oom_kill_process+0x83/0x34a
 [<ffffffff8114d661>] ? mem_cgroup_iter+0x1b/0x1a1
 [<ffffffff8114ecab>] ? mem_cgroup_out_of_memory+0x20b/0x22c
 [<ffffffff8158f703>] ? _raw_spin_unlock_irqrestore+0xb/0xc
 [<ffffffff8114f54f>] ? mem_cgroup_oom_synchronize+0x25a/0x26c
 [<ffffffff8114b9f3>] ? mem_cgroup_is_descendant+0x1d/0x1d
 [<ffffffff8110bff9>] ? pagefault_out_of_memory+0x1a/0x73
 [<ffffffff81591b78>] ? page_fault+0x28/0x30
Task in /docker/033312b058a18b97fb43a0f7c76cc16c4a1233023fc12d20b13cf62896ab3422 killed as a result of limit of /docker/033312b058a18b97fb43a0f7c76cc1
6c4a1233023fc12d20b13cf62896ab3422
memory: usage 91944kB, limit 92160kB, failcnt 12371
memory+swap: usage 184320kB, limit 184320kB, failcnt 22
kmem: usage 3240kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/033312b058a18b97fb43a0f7c76cc16c4a1233023fc12d20b13cf62896ab3422: cache:0KB rss:88704KB rss_huge:10240KB mapped_file:0
KB dirty:0KB writeback:0KB swap:92376KB inactive_anon:44532KB active_anon:44168KB inactive_file:0KB active_file:0KB unevictable:4KB
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[19800]     0 19800  5102336    27141     170      11    23109             0 dotnet
Memory cgroup out of memory: Kill process 19800 (dotnet) score 1094 or sacrifice child
Killed process 19800 (dotnet) total-vm:20409344kB, anon-rss:88380kB, file-rss:20184kB

###Some details###

$ uname -a
Linux default 4.4.104-boot2docker dotnet/coreclr#1 SMP Fri Dec 8 19:23:27 UTC 2017 x86_64 GNU/Linux
$ docker info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 247
Server Version: 17.09.1-ce
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 228
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 896gr7p74gkp0b7f8wpm2cyjw
 Is Manager: true
 ClusterID: ln33cteihinqb7y34715b37un
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.99.100
 Manager Addresses:
  192.168.99.100:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.4.104-boot2docker
Operating System: Boot2Docker 17.09.1-ce (TCL 7.2); HEAD : e7de9ae - Fri Dec  8 19:41:36 UTC 2017
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 9.514GiB
Name: default
ID: L7OH:SH27:OOUQ:ATGD:H75B:AAYK:ZWUR:3ZHK:55ZR:JLHM:MTL5:XC6Z
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 52
 Goroutines: 162
 System Time: 2018-03-13T11:25:04.790227243Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Docker image: microsoft/dotnet:2.0.5-runtime

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 26 (15 by maintainers)

Commits related to this issue

Most upvoted comments

We face the same issue. Switching from .NET core 2.2 to 3.1 made it even worse for our application. Are there any recommendations for temporary solutions? Waiting for .NET core 5.0 (November 2020) isn’t a great solution…

We are facing some similar problems at the moment. I`m not a dotnet dev, but i am quite sure we switched to version 3.x and are still facing these issues. Any updates or hints on this?