moby: containerd process continually restarting: process.json: no such file or directory
Description
Shortly after startup, during a period of heavy system resource usage, I am seeing errors in from journalctl
showing
Feb 12 19:22:19 example.com dockerd[883]: time="2017-02-12T19:22:19.602401661-05:00" level=info msg="libcontainerd: new containerd process, pid: 22486"
Feb 12 19:22:20 example.com dockerd[883]: time="2017-02-12T19:22:20.109646713-05:00" level=fatal msg="open /var/run/docker/libcontainerd/containerd/a4169549305be1652784c635dd42cf0c84ba81be928cd430ab2608f0bf7fb862/1c53926e88b4be112f691600ed84a840c2d42032e2488f349bfe8d476a7ac42e/process.json: no such file or directory"
It then proceeds to keep spawning a new containerd, and then it dies immediately. The result is a system which can’t properly interact with containers, no docker stop/exec, health checks always fail and return unhealthy.
docker exec -i -t container-name /bin/bash rpc error: code = 14 desc = grpc: the connection is unavailable
When inspecting the running processes we can see that the usual docker-containerd
isn’t running
Steps to reproduce the issue: No easy reproduction
- Start up 3-4 containers on a system with health check defined (small light weight ones)
- Put the system under a heavy load (max out CPU usage)
- Do this a lot of times, some machines will recover quickly. others will enter the failed state looking for
process.json
Describe the results you received:
Feb 12 19:22:17 example.com dockerd[883]: time="2017-02-12T19:22:17.339757786-05:00" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Feb 12 19:22:17 example.com dockerd[883]: time="2017-02-12T19:22:17.345519413-05:00" level=warning msg="Health check for container a4169549305be1652784c635dd42cf0c84ba81be928cd430ab2608f0bf7fb862 error: rpc error: code = 13 desc = transport is closing"
Feb 12 19:22:17 example.com dockerd[883]: time="2017-02-12T19:22:17.345855590-05:00" level=warning msg="Health check for container 6c973e40946c04543d6e444081bc7765e1c29e55c03168222c2ee4b0b3073756 error: rpc error: code = 13 desc = transport is closing"
Feb 12 19:22:17 example.com dockerd[883]: time="2017-02-12T19:22:17.346003309-05:00" level=warning msg="Health check for container 22310406af918a0af7e90c2e8234873237057ec84ccb636244b2b1c477416270 error: rpc error: code = 13 desc = transport is closing"
Feb 12 19:22:18 example.com dockerd[883]: time="2017-02-12T19:22:18.644524466-05:00" level=info msg="libcontainerd: new containerd process, pid: 22478"
Feb 12 19:22:19 example.com dockerd[883]: time="2017-02-12T19:22:19.602401661-05:00" level=info msg="libcontainerd: new containerd process, pid: 22486"
Feb 12 19:22:20 example.com dockerd[883]: time="2017-02-12T19:22:20.109646713-05:00" level=fatal msg="open /var/run/docker/libcontainerd/containerd/a4169549305be1652784c635dd42cf0c84ba81be928cd430ab2608f0bf7fb862/1c53926e88b4be112f691600ed84a840c2d42032e2488f349bfe8d476a7ac42e/process.json: no such file or directory"
Feb 12 19:22:20 example.com dockerd[883]: time="2017-02-12T19:22:20.909410988-05:00" level=info msg="libcontainerd: new containerd process, pid: 22494"
Feb 12 19:22:20 example.com dockerd[883]: time="2017-02-12T19:22:20.933222703-05:00" level=fatal msg="open /var/run/docker/libcontainerd/containerd/a4169549305be1652784c635dd42cf0c84ba81be928cd430ab2608f0bf7fb862/1c53926e88b4be112f691600ed84a840c2d42032e2488f349bfe8d476a7ac42e/process.json: no such file or directory"
Feb 12 19:22:22 example.com dockerd[883]: time="2017-02-12T19:22:22.409392219-05:00" level=info msg="libcontainerd: new containerd process, pid: 22500"
Feb 12 19:22:22 example.com dockerd[883]: time="2017-02-12T19:22:22.43553991-05:00" level=fatal msg="open /var/run/docker/libcontainerd/containerd/a4169549305be1652784c635dd42cf0c84ba81be928cd430ab2608f0bf7fb862/1c53926e88b4be112f691600ed84a840c2d42032e2488f349bfe8d476a7ac42e/process.json: no such file or directory"
and the last couple lines repeat forever, with the occasional failed health-check message
Describe the results you expected:
recovery looks like:
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:52:40.063994820-05:00" level=warning msg="Health check for container 484ba5dd5c8aa113f86e0faf3abc3f4e2f9cbdfcf0800750023d979d2e96df46 error: rpc error: code = 13 desc = transport is closing"
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:52:43.081118258-05:00" level=info msg="libcontainerd: new containerd process, pid: 24964"
...
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:53:40.063733001-05:00" level=warning msg="Health check for container 6b862ee77cb8f67ee248068041a9492f3b84b158a4b23b3c91fde26d658f9e22 error: rpc error: code = 4 desc = context deadline exceeded"
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:53:45.150569781-05:00" level=warning msg="Health check for container 484ba5dd5c8aa113f86e0faf3abc3f4e2f9cbdfcf0800750023d979d2e96df46 error: rpc error: code = 4 desc = context deadline exceeded"
Feb 13 11:54:05 example.com dockerd[865]: 2017/02/13 11:53:48 grpc: Server.processUnaryRPC failed to write status stream error: code = 4 desc = "context deadline exceeded"
Feb 13 11:54:05 example.com dockerd[865]: 2017/02/13 11:53:48 grpc: Server.processUnaryRPC failed to write status: stream error: code = 4 desc = "context deadline exceeded"
Feb 13 11:54:05 example.com dockerd[865]: panic: close of nil channel
Feb 13 11:54:05 example.com dockerd[865]: goroutine 155 [running]:
Feb 13 11:54:05 example.com dockerd[865]: panic(0x8078a0, 0xc4203c3d50)
Feb 13 11:54:05 example.com dockerd[865]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Feb 13 11:54:05 example.com dockerd[865]: github.com/docker/containerd/supervisor.(*Supervisor).execExit.func1(0xc4203c70e0, 0xc42017ea90, 0x0)
Feb 13 11:54:05 example.com dockerd[865]: /go/src/github.com/docker/containerd/supervisor/exit.go:90 +0x10c
Feb 13 11:54:05 example.com dockerd[865]: created by github.com/docker/containerd/supervisor.(*Supervisor).execExit
Feb 13 11:54:05 example.com dockerd[865]: /go/src/github.com/docker/containerd/supervisor/exit.go:91 +0xee
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:54:01.384438045-05:00" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Feb 13 11:54:05 example.com dockerd[865]: time="2017-02-13T11:54:01.907649292-05:00" level=info msg="libcontainerd: new containerd process, pid: 25197"
Feb 13 11:54:50 example.com dockerd[865]: time="2017-02-13T11:54:50.594564930-05:00" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Feb 13 11:54:51 example.com dockerd[865]: time="2017-02-13T11:54:50.599276124-05:00" level=info msg="libcontainerd: new containerd process, pid: 25348"
Feb 13 11:54:52 example.com dockerd[865]: time="2017-02-13T11:54:50.605611385-05:00" level=warning msg="Health check for container 6b862ee77cb8f67ee248068041a9492f3b84b158a4b23b3c91fde26d658f9e22 error: rpc error: code = 13 desc = transport is closing"
Feb 13 11:54:56 example.com dockerd[865]: time="2017-02-13T11:54:50.679610404-05:00" level=warning msg="Health check for container 484ba5dd5c8aa113f86e0faf3abc3f4e2f9cbdfcf0800750023d979d2e96df46 error: rpc error: code = 14 desc = grpc: the connection is unavailable"
Feb 13 11:54:56 example.com dockerd[865]: time="2017-02-13T11:54:51.407090487-05:00" level=info msg="libcontainerd: new containerd process, pid: 25356"
Additional information you deem important (e.g. issue happens only occasionally): We were only able to reproduce this 1/10 tries or so
Output of docker version
:
Client:
Version: 1.13.0
API version: 1.25
Go version: go1.7.3
Git commit: 49bf474
Built: Tue Jan 17 09:55:28 2017
OS/Arch: linux/amd64
Server:
Version: 1.13.0
API version: 1.25 (minimum version 1.12)
Go version: go1.7.3
Git commit: 49bf474
Built: Tue Jan 17 09:55:28 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 14
Server Version: 1.13.0
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e)
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.6.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.388 GiB
Name: example.com
ID: XQM5:RIEE:MFIW:YIFG:DAAC:ZYZC:RX2V:JZJI:3MXU:TF3P:V2D6:4TBL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.): Running on AWS
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 26 (13 by maintainers)
Just to say that I just had this issue and removing the directory solved the problem. The message appeared at a rate of about once every 3 seconds.
I have also gotten the following messages. Not sure if they are related:
dockerd[55434]: time=“2017-02-13T22:23:31.760606884Z” level=debug msg=“containerd: error loading process open /var/run/docker/libcontainerd/containerd/564d9c1865ae632aae883bc33b7f965704a3424b564cb8603d4a3e151811816c/init/pid: no such file or directory” id=564d9c1865ae632aae883bc33b7f965704a3424b564cb8603d4a3e151811816c pid=init (about 10 times per second)
dockerd[55434]: time=“2017-02-13T22:34:28.711468652Z” level=error msg=“containerd: notify OOM events” error=“no init process found” (about 10 times per second)
Will be fixed by https://github.com/docker/docker/issues/31662 in 17.03.1