moby: commands such as 'docker run' and 'docker ps' appear to hang indefinitely due to huge request backlog (congestion) in containerd

tl;dr;

There is a single goroutine in containerd that handles incoming requests sequentially. This goroutine can become a bottleneck if requests arrive faster in ‘s.tasks’ via [2] than they can be processed at [1].

from supervisor/supervisor.go:

     16 const (
     17         defaultBufferSize = 2048
     18 )

     21 func New(...) (*Supervisor, error) {
     :
     34         s := &Supervisor{
     :
     40                 tasks:       make(chan Task, defaultBufferSize),
     :
     46         }
     :
     55         return s, nil
     56 }

    155 type Supervisor struct {
     :
    169         tasks          chan Task
     :
    174 }

    267 func (s *Supervisor) Start() error {
     :
    275         go func() {
[1] 276                 for i := range s.tasks {
    277                         s.handleTask(i)
    278                 }
    279         }()
     :
    281 }

    290 func (s *Supervisor) SendTask(evt Task) {
     :
[2] 292         s.tasks <- evt
    293 }

we had reports of dockerd becoming unresponsive. Core dumps of dockerd and containerd show that dockerd is a victim of congestion in containerd:

  • The goroutine shown at lines 275…279 in the snippets of code above is handling an ‘AddProcessTask’ request. Stack trace looks like this.

    waitForCreate createCmd Exec addProcess handleTask Start

  • The ‘s.tasks’ queue is full (2048 requests are already queued here).

  • Many goroutines are blocked in the SendTask() function at [2] because the ‘s.tasks’ queue is full.

The reporter is using the HEALTHCHECK feature of dockerd (typical parameters in Dockerfile are ‘–interval=10’ and ‘–timeout=5’). The majority of the requests seen in the core dump of containerd are ‘AddProcessTask’ issued by dockerd in order to run the container health check programs (typically ‘curl -f http://localhost:8080/{some_path_name}’).

The problem is easily reproducible by adding some instrumentation to the docker-containerd-shim script that slows down the creation of a process in a container. Please see the details in the attachment to this comment.

dockerd runs the monitor() function (see daemon/health.go) in a separate goroutine for each container in order to perform health check monitoring. The function periodically issues a request to containerd to execute the health check program in the container. In an application environment that utilizes the HEALTHCHECK feature extensively, it is possible to have the following situation where multiple monitor() goroutines in dockerd are waiting for responses from containerd to ‘AddProcessTask’ requests:

    dockerd                              containerd
.-------------.   .------------------------------------------------------.
|             |   |                                                      |
| monitor(1)--|---|--goroutine(1)                                        |
|             |   |   SendTask--->.-------.                              |
|             |   |               |       |                              |
| monitor(2)--|---|--goroutine(2) |       |    single handler goroutine  |
|             |   |   SendTask--->|       |   .------------------------. |
|             |   |               |s.tasks|-->| for i := range s.tasks | |
|             |   |               | queue |   |     s.handleTask(i)    | |
|             |   |               |       |   |         addProcess     | |
| monitor(n)--|---|--goroutine(n) |       |   |            ...         | |
|             |   |   SendTask--->|       |   '------------------------' |
|             |   |               '-------'                              |
|             |   |                                                      |
'-------------'   '------------------------------------------------------'

If a monitor goroutine does not receive a response from containerd within the timeout period, the request gets canceled. This is indicated by log messages like the following:

  Health check error: rpc error: code = 4 desc = context deadline exceeded

However, the problem is that the request only gets canceled on the client- side in dockerd but not on the server-side in containerd. So if requests from monitor(1…n) get canceled in dockerd, goroutine(1…n) in containerd remain active and continue to contribute to the backlog in the ‘s.tasks’ queue. Three states/cases need to be considered here:

(a) A request is currently being processed by the ‘handler goroutine’. (b) A request has been added to the ‘s.tasks’ queue but not processed yet. © A request cannot be added to ‘s.tasks’ because the queue is full.

It seems that upstream patch https://github.com/docker/containerd/pull/336 (“Honor context termination on Start and Exec”) only addresses case (a) so this does not look like a comprehensive fix to the problem. Goroutines in state (b) - waiting for their request to be picked up from ‘s.task’ - and goroutines in state © - waiting for free space in ‘s.tasks’ - are still contributing to the backlog and their requests might eventually be handled in the future but no goroutine in dockerd is expecting a response anymore. I believe a comprehensive fix should terminate goroutines in state (b) and in state © too.

Long story

description of ‘reproducer’

  • I used the following Dockerfile to build a simple container image. The health check interval and timeout values are both set to 1 so that dockerd will issue a request to run the health check program at least once every 2 seconds (even if containerd is unresponsive and each request times out).
# cat Dockerfile
FROM rhel7:latest
USER root
MAINTAINER Maintainer_Name

HEALTHCHECK --interval=1s --timeout=1s CMD exit 0
CMD sleep 365d

# docker build .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM rhel7:latest
 ---> e8e3aaf82af5
Step 2 : USER root
 ---> Running in f519b6635f6e
 ---> fbfd6d429c46
Removing intermediate container f519b6635f6e
Step 3 : MAINTAINER Maintainer_Name
 ---> Running in 4d76869e4f5f
 ---> 35b41186b673
Removing intermediate container 4d76869e4f5f
Step 4 : HEALTHCHECK --interval=1s --timeout=1s CMD exit 0
 ---> Running in 6ffee07632b0
 ---> 4686b7656815
Removing intermediate container 6ffee07632b0
Step 5 : CMD sleep 365d
 ---> Running in 3cf1d4c32bbb
 ---> bdb595c0e979
Removing intermediate container 3cf1d4c32bbb
Successfully built bdb595c0e979

# docker tag bdb595c0e979 healthcheck-test
  • I applied the following change/instrumentation to the containerd-shim script. The ‘sleep 3s’ command causes a delay of three seconds each time when containerd handles a request to execute a program in a container for example, a request to execute the health check program. As I mentioned above, dockerd issues requests to execute the health check program at least once every 2 seconds whereas - due to the instrumentation - it takes at least three seconds to handle each of those requests. Hence, a backlog of requests will gradually increase in containerd.
# diff -up /usr/bin/docker-containerd-shim.orig /usr/bin/docker-containerd-shim
--- /usr/bin/docker-containerd-shim.orig	2017-01-11 23:49:24.000000000 +0100
+++ /usr/bin/docker-containerd-shim	2017-02-25 13:43:19.929000000 +0100
@@ -1,4 +1,7 @@
 #!/bin/sh
+
+sleep 3s
+
 . /etc/sysconfig/docker
 [ -e "${DOCKER_CONTAINERD_SHIM_BINARY}" ] || DOCKER_CONTAINERD_SHIM_BINARY=/usr/bin/docker-containerd-shim-current
 if [ ! -f /usr/bin/docker-containerd-shim-current ]; then
  • The state of the container changes to ‘unhealthy’ soon after it has been started. This is because each of the requests from dockerd to containerd to run the health check program actually times out.
# docker images | grep -e "IMAGE ID" -e bdb595c0e979
REPOSITORY                                                   TAG                 IMAGE ID            CREATED             SIZE
healthcheck-test                                             latest              bdb595c0e979        2 minutes ago       192.5 MB

# docker ps -a | grep -e NAMES -e healthcheck-test
CONTAINER ID        IMAGE                                                        COMMAND                  CREATED             STATUS                    PORTS               NAMES
5b83606efc34        healthcheck-test                                             "/bin/sh -c 'sleep 36"   3 seconds ago       Created                                       healthcheck-test

# docker start 5b83606efc34
5b83606efc34

# docker ps -a | grep -e NAMES -e healthcheck-test
CONTAINER ID        IMAGE                                                        COMMAND                  CREATED              STATUS                     PORTS               NAMES
5b83606efc34        healthcheck-test                                             "/bin/sh -c 'sleep 36"   About a minute ago   Up 7 seconds (unhealthy)                       healthcheck-test
  • About two hours after the container has been started, I executed the following commands. Both commands appeared to hang indefinitely.
# date ; ps -ef | grep "sleep 365d" | grep -v grep
Sat Feb 25 16:00:48 CET 2017
root     29452 29391  0 14:02 ?        00:00:00 sleep 365d

# docker run -i -t --rm registry.access.redhat.com/rhel7 sleep 1

# docker ps


- In this situation I created a core dump of containerd and of dockerd.


# systemctl status docker.service
   ...
   CGroup: /system.slice/docker.service
           ├─ 1155 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization...
           ├─ 1559 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metric...
           ...

# gcore -o docker-containerd-current_core 1559
...
Saved corefile docker-containerd-current_core.1559

gcore -o dockerd-current_core 1155
...
Saved corefile dockerd-current_core.1155

details from dockerd-current_core.1155

This goroutine is involved in processing the command ‘docker run -i -t --rm registry.access.redhat.com/rhel7 sleep 1’. The goroutine has issued a request to containerd to start the entrypoint image (sleep 1) of the container. The goroutine is holding a lock on container ID 1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b while it is waiting for a response from containerd (see also goroutine 8609 of containerd shown further below).

goroutine 65557 (pc = 0x4438ea  sp = 0xc4229e24f0  g = 0xc420080820)


(gdb) bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4229e2520, lock=0x0, reason=..., traceEv=24 '\030', traceskip=2)
    at /usr/lib/golang/src/runtime/proc.go:260
#1  0x0000000000453069 in runtime.selectgoImpl (sel=0xc4229e27c0, ~r1=0, ~r2=0xc4229e2650) at /usr/lib/golang/src/runtime/select.go:423
#2  0x0000000000451e5c in runtime.selectgo (sel=0xc4229e27c0) at /usr/lib/golang/src/runtime/select.go:238
#3  0x0000000000b242eb in google.golang.org/grpc/transport.(*Stream).Header (s=0xc421f3e0e0, ~r0=0x0, ~r1=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/vendor/src/google.golang.org/grpc/transport/transport.go:221
#4  0x00000000009ae6f2 in google.golang.org/grpc.recvResponse (dopts=..., t=..., c=0xc42182f2c0, stream=0xc421f3e0e0, reply=..., ~r5=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/vendor/src/google.golang.org/grpc/call.go:54
#5  0x00000000009af25e in google.golang.org/grpc.Invoke (ctx=..., method=..., args=..., reply=..., cc=0xc4204338c0, opts=..., err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/vendor/src/google.golang.org/grpc/call.go:178
#6  0x00000000008e57a2 in github.com/docker/containerd/api/grpc/types.(*aPIClient).CreateContainer (c=0xc42033a2d8, ctx=..., in=0xc42107e9c0, opts=..., 
    ~r3=0x0, ~r4=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/vendor/src/github.com/docker/containerd/api/grpc/types/api.pb.go:927
#7  0x00000000005bdc00 in github.com/docker/docker/libcontainerd.(*container).start (ctr=0xc42107e840, attachStdio=
    {void (github.com/docker/docker/libcontainerd.IOPipe, error *)} 0xc4229e2c40, ~r1=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/libcontainerd/container_linux.go:152
#8  0x00000000005b866e in github.com/docker/docker/libcontainerd.(*client).Create (clnt=0xc420471f80, containerID=..., spec=..., attachStdio=
    {void (github.com/docker/docker/libcontainerd.IOPipe, error *)} 0xc4229e3290, options=..., err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/libcontainerd/client_linux.go:199
#9  0x000000000058d223 in github.com/docker/docker/daemon.(*Daemon).containerStart (daemon=0xc4204151e0, container=0xc421eafa40, err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/start.go:146
#10 0x000000000058c84e in github.com/docker/docker/daemon.(*Daemon).ContainerStart (daemon=0xc4204151e0, name=..., hostConfig=0x0, ~r2=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/start.go:82
#11 0x00000000004f6512 in github.com/docker/docker/api/server/router/container.(*containerRouter).postContainersStart (s=0xc4203ff880, ctx=..., w=..., 
    r=0xc420ff5c20, vars=0xc421aa1980, ~r4=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router/container/container_routes.go:154
    ...
#21 0x00000000004e6670 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP (rs=0xc420bf2dd0, w=..., r=0xc420ff5c20)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
#22 0x000000000078b6dd in net/http.serverHandler.ServeHTTP (sh=..., rw=..., req=0xc420ff5c20) at /usr/lib/golang/src/net/http/server.go:2202
#23 0x0000000000788167 in net/http.(*conn).serve (c=0xc422381380, ctx=...) at /usr/lib/golang/src/net/http/server.go:1579
#24 0x0000000000474901 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:2086


(gdb) up 9
#9  0x000000000058d223 in github.com/docker/docker/daemon.(*Daemon).containerStart (daemon=0xc4204151e0, container=0xc421eafa40, err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/start.go:146

(gdb) print ((struct 'github.com/docker/docker/container.CommonContainer' *)container)->ID
$1 = {
  str = 0xc4227ffe00 "1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b",
  len = 64
}

(gdb) print ((struct 'github.com/docker/docker/container.CommonContainer' *)container)->Config.Image
$2 = {
  str = 0xc421f1c2c0 "registry.access.redhat.com/rhel7",
  len = 32
}

(gdb) print ((struct 'github.com/docker/docker/container.CommonContainer' *)container)->Config.Cmd.array[0]
$3 = {
  str = 0xc4219b2910 "sleep",
  len = 5
}
(gdb) print ((struct 'github.com/docker/docker/container.CommonContainer' *)container)->Config.Cmd.array[1]
$4 = {
  str = 0xc4219b2915 "1",
  len = 1
}


(gdb) up 13
#21 0x00000000004e6670 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP (rs=0xc420bf2dd0, w=..., r=0xc420ff5c20)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router_swapper.go:29

(gdb) print *r
$5 = {
  Method = {
    str = 0xc420b66d20 "POST",
    len = 4
  }, 
  ...
  Proto = {
    str = 0xc420b66d7e "HTTP/1.1", 
    len = 8
  }, 
  ...
  RequestURI = {
    str = 0xc420b66d25 "/v1.24/containers/1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b/start",
    len = 88
  }, 
  ...
}

Several goroutines - including the one that is handling the request from ‘docker ps’ - are blocked by the lock of container ID 1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b. Here is a random example of such a goroutine.

goroutine 65650 (pc = 0x4438ea  sp = 0xc4221551a0  g = 0xc420eba340)


(gdb) bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4221551d0, lock=0x2258660 <runtime.semtable+2688>, reason=..., traceEv=25 '\031', 
    traceskip=4) at /usr/lib/golang/src/runtime/proc.go:260
#1  0x00000000004439de in runtime.goparkunlock (lock=0x2258660 <runtime.semtable+2688>, reason=..., traceEv=25 '\031', traceskip=4)
    at /usr/lib/golang/src/runtime/proc.go:265
#2  0x0000000000453bdd in runtime.semacquire (addr=0xc420b5a704, profile=true) at /usr/lib/golang/src/runtime/sema.go:111
#3  0x0000000000453870 in sync.runtime_Semacquire (addr=0xc420b5a704) at /usr/lib/golang/src/runtime/sema.go:47
#4  0x00000000006a8160 in sync.(*Mutex).Lock (m=0xc420b5a700) at /usr/lib/golang/src/sync/mutex.go:85
#5  0x0000000000577a90 in github.com/docker/docker/daemon.(*Daemon).reducePsContainer (daemon=0xc4204151e0, container=0xc421eafa40, ctx=0xc421db46c0, 
    reducer=
    {void (struct github.com/docker/docker/container.Container *, struct github.com/docker/docker/daemon.listContext *, struct github.com/docker/engine-api/types.Container **, error *)} 0xc422155340, ~r3=0x0, ~r4=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/list.go:204
#6  0x00000000005777d4 in github.com/docker/docker/daemon.(*Daemon).reduceContainers (daemon=0xc4204151e0, config=0xc421cfbcc0, reducer=
    {void (struct github.com/docker/docker/container.Container *, struct github.com/docker/docker/daemon.listContext *, struct github.com/docker/engine-api/types.Container **, error *)} 0xc4221553e8, ~r2=..., ~r3=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/list.go:187
#7  0x0000000000576c2d in github.com/docker/docker/daemon.(*Daemon).Containers (daemon=0xc4204151e0, config=0xc421cfbcc0, ~r1=..., ~r2=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/list.go:101
#8  0x00000000004f5510 in github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersJSON (s=0xc4203ff880, ctx=..., w=..., 
    r=0xc4210fa3c0, vars=0xc4221d6090, ~r4=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router/container/container_routes.go:51
    ...
#18 0x00000000004e6670 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP (rs=0xc420bf2dd0, w=..., r=0xc4210fa3c0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
#19 0x000000000078b6dd in net/http.serverHandler.ServeHTTP (sh=..., rw=..., req=0xc4210fa3c0) at /usr/lib/golang/src/net/http/server.go:2202
#20 0x0000000000788167 in net/http.(*conn).serve (c=0xc421a62f00, ctx=...) at /usr/lib/golang/src/net/http/server.go:1579
#21 0x0000000000474901 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:2086


(gdb) up 5
#5  0x0000000000577a90 in github.com/docker/docker/daemon.(*Daemon).reducePsContainer (daemon=0xc4204151e0, container=0xc421eafa40, ctx=0xc421db46c0, 
    reducer=
    {void (struct github.com/docker/docker/container.Container *, struct github.com/docker/docker/daemon.listContext *, struct github.com/docker/engine-api/types.Container **, error *)} 0xc422155340, ~r3=0x0, ~r4=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/daemon/list.go:204

(gdb) print ((struct 'github.com/docker/docker/container.CommonContainer' *)container)->ID
$6 = {
  str = 0xc4227ffe00 "1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b",
  len = 64
}


(gdb) up 13
#18 0x00000000004e6670 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP (rs=0xc420bf2dd0, w=..., r=0xc4210fa3c0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/docker/api/server/router_swapper.go:29

(gdb) print *r
$7 = {
  Method = {
    str = 0xc421b89ec0 "GET",
    len = 3
  }, 
  ...
  Proto = {
    str = 0xc421b89ee3 "HTTP/1.1", 
    len = 8
  }, 
  ...
  RequestURI = {
    str = 0xc421b89ec4 "/containers/json?all=1&limit=0",
    len = 30
  }, 
  ...
}

details from docker-containerd-current_core.1559

This goroutine is handling a request to run the health check program of container ID 5b83606efc34e8490e1e3ff0c9ffc40761f5e4b518a888054521298dd51ec27b. The goroutine still has to process a huge backlog of requests that are queued in ‘s.tasks’.

goroutine 39 (pc = 0x43175a  sp = 0xc4236bd698  g = 0xc42022e9c0)


gdb) bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4236bd6c8, lock=0x0, reason=..., traceEv=24 '\030', traceskip=2)
    at /usr/lib/golang/src/runtime/proc.go:260
#1  0x0000000000440e59 in runtime.selectgoImpl (sel=0xc4236bd9a0, ~r1=0, ~r2=0xc4236bd800) at /usr/lib/golang/src/runtime/select.go:423
#2  0x000000000043fc4c in runtime.selectgo (sel=0xc4236bd9a0) at /usr/lib/golang/src/runtime/select.go:238
#3  0x0000000000689b25 in github.com/docker/containerd/runtime.(*container).waitForCreate (c=0xc42058c000, p=0xc4236e36c0, cmd=0xc420082000, ~r2=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/runtime/container.go:652
#4  0x0000000000688529 in github.com/docker/containerd/runtime.(*container).createCmd (c=0xc42058c000, pid=..., cmd=0xc420082000, p=0xc4236e36c0, ~r3=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/runtime/container.go:515
#5  0x000000000068815f in github.com/docker/containerd/runtime.(*container).Exec (c=0xc42058c000, pid=..., pspec=..., s=..., pp=..., err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/runtime/container.go:470
#6  0x00000000005231a9 in github.com/docker/containerd/supervisor.(*Supervisor).addProcess (s=0xc4201a40c0, t=0xc420998d90, ~r1=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/add_process.go:29
#7  0x0000000000529673 in github.com/docker/containerd/supervisor.(*Supervisor).handleTask (s=0xc4201a40c0, i=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/supervisor.go:373
#8  0x000000000052c070 in github.com/docker/containerd/supervisor.(*Supervisor).Start.func1 (s=0xc4201a40c0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/supervisor.go:277
#9  0x0000000000462411 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:2086


(gdb) up 3
#3  0x0000000000689b25 in github.com/docker/containerd/runtime.(*container).waitForCreate (c=0xc42058c000, p=0xc4236e36c0, cmd=0xc420082000, ~r2=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/runtime/container.go:652

(gdb) print p.spec.Args.len
$1 = 3
(gdb) print p.spec.Args.array[0]
$2 = {
  str = 0xc420aa04b0 "/bin/sh", 
  len = 7
}
(gdb) print p.spec.Args.array[1]
$3 = {
  str = 0xc420aa04b8 "-c",
  len = 2
}
(gdb) print p.spec.Args.array[2]
$4 = {
  str = 0xc420aa04ba "exit 0",
  len = 6
}

(gdb) print p.container.id
$5 = {
  str = 0xc420632080 "5b83606efc34e8490e1e3ff0c9ffc40761f5e4b518a888054521298dd51ec27b", 
  len = 64            ^^^^^^^^^^^^
}

(gdb) print p.container.processes.count
$6 = 1

(gdb) print p.container.processes.buckets[0]
$7 = {
  ...
  keys = {{
      str = 0x87c629 "init",
      len = 4
    }, {
      ...
    }}, 
  values = {0xc42022fd40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  ...
}

(gdb) print ((struct 'github.com/docker/containerd/runtime.process' *)0xc42022fd40)->pid
$8 = 29452

(gdb) print ((struct 'github.com/docker/containerd/runtime.process' *)0xc42022fd40)->spec.Args.len
$9 = 3
(gdb) print ((struct 'github.com/docker/containerd/runtime.process' *)0xc42022fd40)->spec.Args.array[0]
$10 = {
  str = 0xc42058a129 "/bin/sh",
  len = 7
}
(gdb) print ((struct 'github.com/docker/containerd/runtime.process' *)0xc42022fd40)->spec.Args.array[1]
$11 = {
  str = 0xc42058a130 "-c",
  len = 2
}
(gdb) print ((struct 'github.com/docker/containerd/runtime.process' *)0xc42022fd40)->spec.Args.array[2]
$12 = {
  str = 0xc42058a132 "sleep 365d", 
  len = 10
}


(gdb) up 5
#8  0x000000000052c070 in github.com/docker/containerd/supervisor.(*Supervisor).Start.func1 (s=0xc4201a40c0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/supervisor.go:277

(gdb) print *s.tasks
$13 = {
  qcount = 2048, <--- a backlog of 2048 requests that have not been processed yet
  dataqsiz = 2048, 
  buf = 0xc4201ea000, 
  elemsize = 16, 
  ...
}

(gdb) x/4096xg 0xc4201ea000
0xc4201ea000:	0x0000000000c7ec40	0x000000c4203b2070 <- [*] see below
0xc4201ea010:	0x0000000000c7ec40	0x000000c4222917a0
0xc4201ea020:	0x0000000000c7ec40	0x000000c422291880
...
0xc4201f1fd0:	0x0000000000c7ec40	0x000000c421d67ce0
0xc4201f1fe0:	0x0000000000c7ec40	0x000000c421d67dc0
0xc4201f1ff0:	0x0000000000c7ec40	0x000000c4222916c0

(gdb) x/xg 0x0000000000c7ec40
0xc7ec40 <go.itab.*github.com/docker/containerd/supervisor.AddProcessTask,github.com/docker/containerd/supervisor.Task>:	0x0000000000803f60

Request [*] has been queued in ‘s.tasks’ by goroutine 5226. This is a request to run the health check program of container ID 5b83606efc34e8490e1e3ff0c9ffc40761f5e4b518a888054521298dd51ec27b.

(gdb) print ((struct 'github.com/docker/containerd/supervisor.AddProcessTask' *)0x000000c4203b2070)->ID
$14 = {
  str = 0xc421fb1500 "5b83606efc34e8490e1e3ff0c9ffc40761f5e4b518a888054521298dd51ec27b",
  len = 64
}

(gdb) print ((struct 'github.com/docker/containerd/supervisor.AddProcessTask' *)0x000000c4203b2070)->ProcessSpec.Args.len
$15 = 3
(gdb) print ((struct 'github.com/docker/containerd/supervisor.AddProcessTask' *)0x000000c4203b2070)->ProcessSpec.Args.array[0]
$16 = {
  str = 0xc4222640d0 "/bin/sh", 
  len = 7
}
(gdb) print ((struct 'github.com/docker/containerd/supervisor.AddProcessTask' *)0x000000c4203b2070)->ProcessSpec.Args.array[1]
$17 = {
  str = 0xc4222640d8 "-c",
  len = 2
}
(gdb) print ((struct 'github.com/docker/containerd/supervisor.AddProcessTask' *)0x000000c4203b2070)->ProcessSpec.Args.array[2]
$18 = {
  str = 0xc4222640da "exit 0",
  len = 6
}

(gdb) print ((struct 'github.com/docker/containerd/supervisor.baseTask' *)0x000000c4203b2070)->errCh
$19 = (chan error) 0xc420463c20

(gdb) print ((struct 'github.com/docker/containerd/supervisor.baseTask' *)0x000000c4203b2070)->errCh.recvq.first.g.goid
$20 = 5226

This goroutine needs to wait until goroutine 39 picks request [] from ‘s.tasks’ and processes the request. This can take a long time because there are many requests in the backlog that will be processed before goroutine 39 gets to request [].

goroutine 5226 (pc = 0x43175a  sp = 0xc422165878  g = 0xc421b73ba0)


(gdb) bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4221658a8, lock=0xc420463c78, reason=..., traceEv=23 '\027', traceskip=3)
    at /usr/lib/golang/src/runtime/proc.go:260
#1  0x000000000043184e in runtime.goparkunlock (lock=0xc420463c78, reason=..., traceEv=23 '\027', traceskip=3) at /usr/lib/golang/src/runtime/proc.go:265
#2  0x00000000004094cf in runtime.chanrecv (t=0x7d3480, c=0xc420463c20, ep=0xc422165a40, block=true, selected=120, received=32)
    at /usr/lib/golang/src/runtime/chan.go:496            ^^^^^^^^^^^^
                                                          see 'errCh' above
#3  0x0000000000409195 in runtime.chanrecv1 (t=0x7d3480, c=0xc420463c20, elem=0xc422165a40) at /usr/lib/golang/src/runtime/chan.go:378
#4  0x0000000000519b9f in github.com/docker/containerd/api/grpc/server.(*apiServer).AddProcess (s=0xc420122400, ctx=..., r=0xc421b75e00, ~r2=0x0, ~r3=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/api/grpc/server/server_linux.go:54
#5  0x000000000051f44d in github.com/docker/containerd/api/grpc/types._API_AddProcess_Handler (srv=..., ctx=..., dec=
    {void (interface {}, error *)} 0xc422165b38, interceptor=
    {void (golang.org/x/net/context.Context, interface {}, struct google.golang.org/grpc.UnaryServerInfo *, void (golang.org/x/net/context.Context, interface {}, interface {} *, error *), interface {} *, error *)} 0xc422165b40, ~r4=..., ~r5=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/api/grpc/types/api.pb.go:1164
#6  0x00000000004f3cb4 in google.golang.org/grpc.(*Server).processUnaryRPC (s=0xc4201d2090, t=..., stream=0xc421c32000, srv=0xc4201d11a0, 
    md=0xc782b8 <github.com/docker/containerd/api/grpc/types.statictmp_438+120>, trInfo=0xc421fc57a0, err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:522
#7  0x00000000004f57cd in google.golang.org/grpc.(*Server).handleStream (s=0xc4201d2090, t=..., stream=0xc421c32000, trInfo=0xc421fc57a0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:682
#8  0x00000000004f730b in google.golang.org/grpc.(*Server).serveStreams.func1.1 (&wg=0xc42024a040, s=0xc4201d2090, st=..., stream=0xc421c32000)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:348
#9  0x0000000000462411 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:2086

This goroutine is involved in processing the command ‘docker run -i -t --rm registry.access.redhat.com/rhel7 sleep 1’. The goroutine has received a request from goroutine 65557 in dockerd (see further above) to start the entrypoint image (sleep 1) of container ID 1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b. The goroutine cannot add the request to ‘s.tasks’ at the moment because the queue is full (there are already 2048 requests in the queue). goroutine 65557 in dockerd is waiting for a response from this goroutine in containerd.

goroutine 8609 (pc = 0x43175a  sp = 0xc4238f5888  g = 0xc4237f2ea0)


(gdb) bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4238f58b8, lock=0xc420146ad8, reason=..., traceEv=22 '\026', traceskip=3)
    at /usr/lib/golang/src/runtime/proc.go:260
#1  0x000000000043184e in runtime.goparkunlock (lock=0xc420146ad8, reason=..., traceEv=22 '\026', traceskip=3) at /usr/lib/golang/src/runtime/proc.go:265
#2  0x00000000004087d8 in runtime.chansend (t=0x7d3600, c=0xc420146a80, ep=0xc4238f59e8, block=true, callerpc=5408685, ~r5=45)
    at /usr/lib/golang/src/runtime/chan.go:223
#3  0x000000000040859d in runtime.chansend1 (t=0x7d3600, c=0xc420146a80, elem=0xc4238f59e8) at /usr/lib/golang/src/runtime/chan.go:107
#4  0x00000000005287ad in github.com/docker/containerd/supervisor.(*Supervisor).SendTask (s=0xc4201a40c0, evt=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/supervisor.go:292
#5  0x0000000000514e45 in github.com/docker/containerd/api/grpc/server.(*apiServer).CreateContainer (s=0xc420122400, ctx=..., c=0xc423285e00, ~r2=0x0, 
    ~r3=...) at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/api/grpc/server/server.go:65
#6  0x000000000051e84d in github.com/docker/containerd/api/grpc/types._API_CreateContainer_Handler (srv=..., ctx=..., dec=
    {void (interface {}, error *)} 0xc4238f5b38, interceptor=
    {void (golang.org/x/net/context.Context, interface {}, struct google.golang.org/grpc.UnaryServerInfo *, void (golang.org/x/net/context.Context, interface {}, interface {} *, error *), interface {} *, error *)} 0xc4238f5b40, ~r4=..., ~r5=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/api/grpc/types/api.pb.go:1092
#7  0x00000000004f3cb4 in google.golang.org/grpc.(*Server).processUnaryRPC (s=0xc4201d2090, t=..., stream=0xc4232ff5e0, srv=0xc4201d11a0, 
    md=0xc78258 <github.com/docker/containerd/api/grpc/types.statictmp_438+24>, trInfo=0xc4237b1fb0, err=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:522
#8  0x00000000004f57cd in google.golang.org/grpc.(*Server).handleStream (s=0xc4201d2090, t=..., stream=0xc4232ff5e0, trInfo=0xc4237b1fb0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:682
#9  0x00000000004f730b in google.golang.org/grpc.(*Server).serveStreams.func1.1 (&wg=0xc42024a040, s=0xc4201d2090, st=..., stream=0xc4232ff5e0)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/containerd-471f03c11413d9ab1523de24d3e79ae3a7b8126e/vendor/src/google.golang.org/grpc/server.go:348
#10 0x0000000000462411 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:2086


(gdb) up 4
#4  0x00000000005287ad in github.com/docker/containerd/supervisor.(*Supervisor).SendTask (s=0xc4201a40c0, evt=...)
    at /builddir/build/BUILD/docker-047e51b797564227b0bf26f3aa448f563bea5c71/_build/src/github.com/docker/containerd/supervisor/supervisor.go:292

(gdb) print evt
$21 = {
  tab = 0xc7ee40 <com/docker/containerd/supervisor.Task>, 
  data = 0xc420362270
}

(gdb) x/xg 0xc7ee40
0xc7ee40 <go.itab.*github.com/docker/containerd/supervisor.StartTask,github.com/docker/containerd/supervisor.Task>:	0x0000000000803f60

(gdb) print ((struct 'github.com/docker/containerd/supervisor.StartTask' *)0xc420362270)->ID
$22 = {
  str = 0xc42385eb40 "1666f64b74c51733506a0b182ec966024fb6855d0bd3e01256e77258dc345b0b",
  len = 64
}

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 11
  • Comments: 23 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@mlaventure sure, added, we found this in the 1.12.x releases