moby: Cannot start container: Getting the final child's pid from pipe caused "EOF"

Description

Intermittently containers cannot be started and docker returns the following error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown.

Restarting the machine resolves the issue.

Steps to reproduce the issue:

docker run --rm -it img /bin/bash

Describe the results you received: docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown.

Describe the results you expected: Container should start

Additional information you deem important (e.g. issue happens only occasionally): The issue happens only occasionally. It appears to be connected to https://github.com/moby/moby/issues/37722 and https://github.com/docker/for-linux/issues/856 Feel free to close if there’s nothing Docker can do about it.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b
 Built:             Wed Mar 11 01:27:04 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b
  Built:            Wed Mar 11 01:25:42 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 19
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: journald
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1062.18.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 6.492GiB
 Name: fs2
 ID: K2B7:KJGO:JNN3:PQWZ:E7AS:YUUY:YCUH:ALUD:2QXI:TEPE:Q6V3:NIEZ
 Docker Root Dir: /data/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.): VM

/var/log/messages entries

Apr 20 09:13:21 xxx kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0xc0d0
Apr 20 09:13:21 xxx kernel: CPU: 1 PID: 3522 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.18.1.el7.x86_64 #1
Apr 20 09:13:21 xxx kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 2:1.10.2-58953eb7 04/01/2014
Apr 20 09:13:21 xxx kernel: Call Trace:
Apr 20 09:13:21 xxx kernel: [<ffffffffa6d7b416>] dump_stack+0x19/0x1b
Apr 20 09:13:21 xxx kernel: [<ffffffffa67c3fc0>] warn_alloc_failed+0x110/0x180
Apr 20 09:13:21 xxx kernel: [<ffffffffa6d7698a>] __alloc_pages_slowpath+0x6bb/0x729
Apr 20 09:13:21 xxx kernel: [<ffffffffa67c8636>] __alloc_pages_nodemask+0x436/0x450
Apr 20 09:13:21 xxx kernel: [<ffffffffa6816c58>] alloc_pages_current+0x98/0x110
Apr 20 09:13:21 xxx kernel: [<ffffffffa67e3658>] kmalloc_order+0x18/0x40
Apr 20 09:13:21 xxx kernel: [<ffffffffa6822216>] kmalloc_order_trace+0x26/0xa0
Apr 20 09:13:21 xxx kernel: [<ffffffffa68261a1>] __kmalloc+0x211/0x230
Apr 20 09:13:21 xxx kernel: [<ffffffffa683f041>] memcg_alloc_cache_params+0x81/0xb0
Apr 20 09:13:21 xxx kernel: [<ffffffffa67e3304>] do_kmem_cache_create+0x74/0xf0
Apr 20 09:13:21 xxx kernel: [<ffffffffa67e3482>] kmem_cache_create+0x102/0x1b0
Apr 20 09:13:21 xxx kernel: [<ffffffffc069bdd1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Apr 20 09:13:21 xxx kernel: [<ffffffffc069c6d4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c44054>] ops_init+0x44/0x150
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c44203>] setup_net+0xa3/0x160
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c449a5>] copy_net_ns+0xb5/0x180
Apr 20 09:13:21 xxx kernel: [<ffffffffa66cb599>] create_new_namespaces+0xf9/0x180
Apr 20 09:13:21 xxx kernel: [<ffffffffa66cb7da>] unshare_nsproxy_namespaces+0x5a/0xc0
Apr 20 09:13:21 xxx kernel: [<ffffffffa669afeb>] SyS_unshare+0x1cb/0x340
Apr 20 09:13:21 xxx kernel: [<ffffffffa6d8dede>] system_call_fastpath+0x25/0x2a
Apr 20 09:13:21 xxx kernel: Mem-Info:
Apr 20 09:13:21 xxx kernel: active_anon:58019 inactive_anon:86369 isolated_anon:0#012 active_file:75955 inactive_file:1180913 isolated_file:0#012 unevictable:6102 dirty:95 writeback:0 unstable:0#012 slab_reclaimable:90514 slab_unreclaimable:48526#012 mapped:35231 shmem:143 pagetables:3532 bounce:0#012 free:62074 free_pcp:0 free_cma:0
Apr 20 09:13:21 xxx kernel: Node 0 DMA free:15908kB min:156kB low:192kB high:232kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 20 09:13:21 xxx kernel: lowmem_reserve[]: 0 2830 6628 6628
Apr 20 09:13:21 xxx kernel: Node 0 DMA32 free:127088kB min:28792kB low:35988kB high:43188kB active_anon:76456kB inactive_anon:137404kB active_file:221232kB inactive_file:2012980kB unevictable:4900kB isolated(anon):0kB isolated(file):0kB present:3129200kB managed:2898768kB mlocked:4900kB dirty:184kB writeback:0kB mapped:107104kB shmem:304kB slab_reclaimable:155508kB slab_unreclaimable:62240kB kernel_stack:1584kB pagetables:4852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Apr 20 09:13:21 xxx kernel: lowmem_reserve[]: 0 0 3798 3798
Apr 20 09:13:21 xxx kernel: Node 0 Normal free:105300kB min:38632kB low:48288kB high:57948kB active_anon:155620kB inactive_anon:208072kB active_file:82588kB inactive_file:2710672kB unevictable:19508kB isolated(anon):0kB isolated(file):0kB present:4022272kB managed:3892156kB mlocked:19508kB dirty:196kB writeback:0kB mapped:33820kB shmem:268kB slab_reclaimable:206548kB slab_unreclaimable:131864kB kernel_stack:2256kB pagetables:9276kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Apr 20 09:13:21 xxx kernel: lowmem_reserve[]: 0 0 0 0
Apr 20 09:13:21 xxx kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Apr 20 09:13:21 xxx kernel: Node 0 DMA32: 9256*4kB (UEM) 8488*8kB (UEM) 1288*16kB (UEM) 25*32kB (UM) 9*64kB (M) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 127040kB
Apr 20 09:13:21 xxx kernel: Node 0 Normal: 14942*4kB (UEM) 5699*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 105360kB
Apr 20 09:13:21 xxx kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 20 09:13:21 xxx kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 20 09:13:21 xxx kernel: 1260617 total pagecache pages
Apr 20 09:13:21 xxx kernel: 1355 pages in swap cache
Apr 20 09:13:21 xxx kernel: Swap cache stats: add 45752, delete 44397, find 6635088/6637528
Apr 20 09:13:21 xxx kernel: Free swap  = 4148988kB
Apr 20 09:13:21 xxx kernel: Total swap = 4194300kB
Apr 20 09:13:21 xxx kernel: 1791866 pages RAM
Apr 20 09:13:21 xxx kernel: 0 pages HighMem/MovableOnly
Apr 20 09:13:21 xxx kernel: 90158 pages reserved
Apr 20 09:13:21 xxx kernel: kmem_cache_create(nf_conntrack_ffff9b3827d29480) failed with error -12
Apr 20 09:13:21 xxx kernel: CPU: 1 PID: 3522 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.18.1.el7.x86_64 #1
Apr 20 09:13:21 xxx kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 2:1.10.2-58953eb7 04/01/2014
Apr 20 09:13:21 xxx kernel: Call Trace:
Apr 20 09:13:21 xxx kernel: [<ffffffffa6d7b416>] dump_stack+0x19/0x1b
Apr 20 09:13:21 xxx kernel: [<ffffffffa67e3507>] kmem_cache_create+0x187/0x1b0
Apr 20 09:13:21 xxx kernel: [<ffffffffc069bdd1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Apr 20 09:13:21 xxx kernel: [<ffffffffc069c6d4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c44054>] ops_init+0x44/0x150
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c44203>] setup_net+0xa3/0x160
Apr 20 09:13:21 xxx kernel: [<ffffffffa6c449a5>] copy_net_ns+0xb5/0x180
Apr 20 09:13:21 xxx kernel: [<ffffffffa66cb599>] create_new_namespaces+0xf9/0x180
Apr 20 09:13:21 xxx kernel: [<ffffffffa66cb7da>] unshare_nsproxy_namespaces+0x5a/0xc0
Apr 20 09:13:21 xxx kernel: [<ffffffffa669afeb>] SyS_unshare+0x1cb/0x340
Apr 20 09:13:21 xxx kernel: [<ffffffffa6d8dede>] system_call_fastpath+0x25/0x2a
Apr 20 09:13:21 xxx kernel: Unable to create nf_conn slab cache
Apr 20 09:13:21 xxx containerd: time="2020-04-20T09:13:21.565957368+02:00" level=info msg="shim reaped" id=a8da2d54ebf451c4ee6118a276b2e4a10f3d6f61ebc52b853c34419ba4c132bb
Apr 20 09:13:21 xxx dockerd: time="2020-04-20T09:13:21.579186951+02:00" level=error msg="stream copy error: reading from a closed fifo"

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 41
Comments: 51 (7 by maintainers)

Links to this issue

docker - Does git-checkout keep 'half-done' worktree changes on fatal error? - Stack Overflow

Commits related to this issue

hotfix: correct kubernetes memory on ci.jenkins.io - https://github.com/moby/moby/issues/40835\#issuecomment-649216095 Signed-off-by: Damien Duportal <damien.duportal@gmail.com> — committed to dduportal/jenkins-infra by dduportal 3 years ago
hotfix: correct kubernetes memory on ci.jenkins.io - https://github.com/moby/moby/issues/40835\#issuecomment-649216095 (#1822) Signed-off-by: Damien Duportal <damien.duportal@gmail.com> — committed to jenkins-infra/jenkins-infra by dduportal 3 years ago

Most upvoted comments

It may not matter, I have got same error message when creating deployment on GKE(ver.15.12.2). After investigating, I found mistake manifesto file :

        resources:
          limits:
            cpu: 1100m
            memory: 3000m  # correct: 3000M
          requests:
            cpu: 1100m
            memory: 3000m # correct: 3000M

JFYI.

+52

obilixilido on Jun 25, 2020

i have the same problem,and i fixed it by increase the pids.max.

the way1:

$ sysctl -n kernel.pid_max
32768
$ sysctl -w kernel.pid_max=100000

the way2:

$ sysctl -n user.max_user_namespaces
0  
# if zero try this
$ sysctl -w user.max_user_namespaces=15000

in way3:

$ grep -w 'runc:\[1:CHILD\]: page allocation failure' /var/log/messages | tail -n 4
Nov 20 16:13:54 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:15:46 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:16:28 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:16:41 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0

solution1:

echo 3 > /proc/sys/vm/drop_caches

solution2:

echo 1 > /proc/sys/vm/compact_memory
# or
sysctl -w vm.compact_memory=1

+35

zhangguanzhang on Jan 7, 2021

Same issue here. This issue tends to happen when there’re a large number of containers running on the host. (33 this time)

+14

deanmax on Jun 10, 2020

Okay, just found another interesting post on other forum. If you are running an vps which is virtualized with Virtuozzo you hosting provider maybe locked your tasks… Im using strato and it seems to be that they have limited my server. Under /proc/user_beancounters you can find those settings. The numprocs is set to 700 and my actual held is 661. Starting an bigger docker stock seems to be impossible…

You can find more in this post https://serverfault.com/questions/1017994/docker-compose-oci-runtime-create-failed-pthread-create-failed/1018402

It seems to be there is no bug…

+11

ClepToManix on Jul 31, 2020

Just found the following indicating that it’s a configuration issue on my side:

sysctl -w user.max_user_namespaces=15000

https://github.com/docker/docker.github.io/issues/7962#issuecomment-450889365

+11

sudo-bmitch on Jun 15, 2020

My case is swap size not enough, and can quick fix by this

i have the same problem,and i fixed it by increase the pids.max.

the way1:

$ sysctl -n kernel.pid_max
32768
$ sysctl -w kernel.pid_max=100000

the way2:

$ sysctl -n user.max_user_namespaces
0  
# if zero try this
$ sysctl -w user.max_user_namespaces=15000

in way3:

$ grep -w 'runc:\[1:CHILD\]: page allocation failure' /var/log/messages | tail -n 4
Nov 20 16:13:54 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:15:46 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:16:28 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0
Nov 20 16:16:41 ETL010080 kernel: runc:[1:CHILD]: page allocation failure: order:4, mode:0x10c0d0

solution1:

echo 3 > /proc/sys/vm/drop_caches

solution2:

echo 1 > /proc/sys/vm/compact_memory
# or
sysctl -w vm.compact_memory=1

For me I only run the way3 fix and you can put it in cronjob to run it every few hours/days For long term fix is increase the swap size of the machine(s)

myclau on May 10, 2021

I solved it by removing MountFlags=private from the systemd config of the docker service. Though I neither can say when exactly this started to become an issue, nor do I remember why I’ve added it.

dontub on Jul 24, 2020

I fixed this issue by increasing the amount of memory request in kubernetes. The low memory limit did not let the process to execute and this message was returned

aahemm on Apr 10, 2021

I’ve had the same issue starting WebODM. Turns out there was an option called oom_score_adj in their docker-compose.yml which seems to have something to do with the linux kernel killing processes consuming too much memory too fast.

Commenting out those option for redis and opendronemap/webodm_db solved it for me.

Tobias-UniBwM on Dec 14, 2021

@ceecko The error is during container bootstrap. While it is “before the container starts”, it is part of the startup process.

Explaining how runc works is kind of tricky on a forum… but it basically requires multiple re-execs… that is runc executes, sets up some pipes, then re-executes, sends some data over the pipes to the original process. The EOF comes because the pipe writer (the re-exec’d process) closed the pipe but the reader did not get the data it was expecting. This would pretty much be due to a crash… which I would generally pin onto an OOM kill.

Because of the nature of the crash, it is really difficult to debug where it is coming from. Since a reboot helps, it would seem to be resource related.

cpuguy83 on Oct 16, 2020

I just restart docker service(systemctl restart docker.service) and it works for me. look at : https://github.com/microsoft/vscode-docker/issues/1963#issuecomment-627254890

VictorLee0321 on Aug 18, 2020

I solved it by removing MountFlags=private from the systemd config of the docker service. Though I neither can say when exactly this started to become an issue, nor do I remember why I’ve added it.

In my case (CentOS 7.8), it was MountFlags=slave, and solved it by removing the whole line and restarting the docker daemon.

Aetherus on Jul 31, 2020

The user.max_user_namespaces kernel parameter doesn’t seem to help in my case

~> sysctl -n user.max_user_namespaces
15000
~> docker run --rm -it busybox date
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown.

deanmax on Jun 24, 2020

Nevermind, i got it work by enabling privileged mode

VanHoevenTR on Jul 2, 2022

In my k8s environment, it seems the sandbox container couldn’t find the user-created container resource(pid, file… etc), as the user-created container had finished too fast. The container job is just print date and It ends in 1sec. I created the following container with a one-minute cronjob and tested it.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: ysoftman-cronjob-test
spec:
  schedule: "*/1 * * * *"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: ysoftman-cronjob-test
            image: busybox
            imagePullPolicy: Always
            command:
            - /bin/sh
            - -c
            - date

The error no longer occurred after I added ‘sleep 5’ command in my container.

  command:
  - /bin/sh
  - -c
  - date; sleep 5

ysoftman on Aug 14, 2021

Does this help? sysctl -w vm.min_free_kbytes=540672 I was facing the same issue. After increasing that setting, it was working for me again. No restart required.

pjohndavid on Oct 22, 2020