moby: random extremely slow response times of dockerd

Description

We’re running docker 1.12.4 on AWS on x1.32xlarge instance with Ubuntu 16.04. Randomly, the docker daemon responds very slowly to almost all API calls.

A live example caught when running docker run:

# docker run -it --rm ubuntu ipcs -l
docker: Error response from daemon: shim error: context deadline exceeded.

Logs from this time period:

Dec 21 11:09:58 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:09:58.589572735Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:09:59 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:09:59.498743699Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:23.123504197Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:23.124479766Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:23.125179984Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:25 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:25.924602801Z" level=error msg="containerd: start container" error="shim error: context deadline exceeded" id=14312312e55d35071264d211d2dfd8095454d78c62c39c44fb3e77a2e49f975a
Dec 21 11:10:25 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:25.925227778Z" level=error msg="Create container failed with error: shim error: context deadline exceeded"
Dec 21 11:10:26 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:26.347312214Z" level=error msg="Handler for POST /v1.24/containers/14312312e55d35071264d211d2dfd8095454d78c62c39c44fb3e77a2e49f975a/start returned error: shim error: context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.469826648Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.470011788Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.470005956Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.470051604Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.470715026Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:57.498872626Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:58 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:58.590113948Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:58 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:58.590108916Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:10:59 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:10:59.499523428Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:23.124304766Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:23.125017751Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:23 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:23.125686946Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:57.471164957Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:57.471337902Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:57.471380104Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
Dec 21 11:11:57 ip-10-69-11-89 dockerd[127631]: time="2016-12-21T11:11:57.471729462Z" level=warning msg="Health check error: rpc error: code = 4 desc = context deadline exceeded"
...
...

Steps to reproduce the issue:

  1. Running dockerd with AUFS graphdriver, live-restore option enabled.

Output of docker version:

Client:
 Version:      1.12.4
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   1564f02
 Built:        Tue Dec 13 00:08:34 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.4
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   1564f02
 Built:        Tue Dec 13 00:08:34 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 443
 Running: 363
 Paused: 0
 Stopped: 80
Images: 1520
Server Version: 1.12.4
Storage Driver: aufs
 Root Dir: /opt/io1/docker/aufs
 Backing Filesystem: extfs
 Dirs: 4340
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 1.876 TiB
Name: ip-10-69-11-89
ID: UGZS:UFD3:GB4C:W5MX:JU2L:K7PH:6ZWS:4GPM:27Q5:UNNN:X3DC:YDT7
Docker Root Dir: /opt/io1/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional info At that moment system had free CPU time (almost no io_wait) and over 50% of free memory.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 31 (16 by maintainers)

Most upvoted comments

@mlaventure Which docker version is linked to containerd that contains this fix?

I have a similar issue that always occurs with my tests after adding HEALTHCHECK to containers. The issue happens in Docker 1.12.6+ , if I downgrade to 1.12.3 it works fine.

@RRAlex I’d say your issue is different than the one initially reported here, could you please open a new issue with all the details asked by the template?