containerd: containerd hangs
cri-containerd.log:
1205 03:58:34.355341 1929 instrumented_service.go:129] StartContainer for "6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c" returns successfully
E1205 03:58:39.830445 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E1205 03:58:45.821256 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = Canceled desc = context canceled
I1205 03:59:05.021052 1929 instrumented_service.go:287] ImageFsInfo
I1205 03:59:05.021131 1929 instrumented_service.go:292] ImageFsInfo returns filesystem info [&FilesystemUsage{Timestamp:1512446340907845245,StorageId:&StorageIdentifier{Uuid:27291a55-d3a1-4729-8b41-397876fd302a,},UsedBytes:&UInt64Value{Value:2505028319,},InodesUsed:&UInt64Value{Value:75245,},}]
E1205 03:59:05.021638 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1205 03:59:05.039675 1929 instrumented_service.go:287] ImageFsInfo
I1205 03:59:05.040096 1929 instrumented_service.go:292] ImageFsInfo returns filesystem info [&FilesystemUsage{Timestamp:1512446340907845245,StorageId:&StorageIdentifier{Uuid:27291a55-d3a1-4729-8b41-397876fd302a,},UsedBytes:&UInt64Value{Value:2505028319,},InodesUsed:&UInt64Value{Value:75245,},}]
E1205 03:59:05.053445 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E1205 04:00:05.033052 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E1205 04:00:05.040684 1929 instrumented_service.go:52] ListPodSandbox failed, error: failed to list sandbox containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E1205 04:00:31.804838 1929 sandbox_run.go:216] Failed to delete sandbox container "7c162d9bb254bed8e2c38bcd93340a74e49fd2c08a273b1989e13e4f75bf1554": context deadline exceeded: unknown
E1205 04:00:31.820405 1929 sandbox_run.go:170] Failed to delete containerd container "7c162d9bb254bed8e2c38bcd93340a74e49fd2c08a273b1989e13e4f75bf1554": context deadline exceeded: unknown
containerd.log:
ime="2017-12-05T03:56:48Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/delete" type=containerd.events.TaskDelete
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/events" ns=k8s.io topic="/tasks/exit" type=containerd.events.TaskExit
time="2017-12-05T03:58:34Z" level=debug msg="received signal" module=containerd signal=child exited
time="2017-12-05T03:58:34Z" level=info msg="shim reaped" id=2af28178b9da5c29ec147de86732c502d8132a071cd8dd4c5939f48515a7a6c7 module="containerd/tasks"
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/delete" type=containerd.events.TaskDelete
time="2017-12-05T03:58:34Z" level=debug msg="prepare snapshot" key=6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c module="containerd/snapshot" parent="sha256:3e7065e2902c8af4232958c4770df3036568e0b266ccbafefcb8c3c36af4197f"
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/snapshot" ns=k8s.io topic="/snapshot/prepare" type=containerd.events.SnapshotPrepare
time="2017-12-05T03:58:34Z" level=debug msg="get snapshot mounts" key=6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c module="containerd/snapshot"
time="2017-12-05T03:58:34Z" level=debug msg="get snapshot mounts" key=6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c module="containerd/snapshot"
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/containers" ns=k8s.io topic="/containers/create" type=containerd.events.ContainerCreate
time="2017-12-05T03:58:34Z" level=debug msg="get snapshot mounts" key=6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c module="containerd/snapshot"
time="2017-12-05T03:58:34Z" level=info msg="shim /home/cri-containerd/usr/local/bin/containerd-shim started" address="/containerd-shim/k8s.io/6b739d7d56993f63f54b9643563ef6739864653dc377a120cbe876685c9fe42c/shim.sock" debug=false module="containerd/tasks" pid=15055
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/create" type=containerd.events.TaskCreate
time="2017-12-05T03:58:34Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/start" type=containerd.events.TaskStart
time="2017-12-05T04:03:10Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/exec-started" type=containerd.events.TaskExecStarted
time="2017-12-05T04:03:10Z" level=debug msg="event published" module="containerd/events" ns=k8s.io topic="/tasks/exit" type=containerd.events.TaskExit
time="2017-12-05T04:04:10Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/exec-started" type=containerd.events.TaskExecStarted
time="2017-12-05T04:04:10Z" level=debug msg="event published" module="containerd/events" ns=k8s.io topic="/tasks/exit" type=containerd.events.TaskExit
time="2017-12-05T04:05:10Z" level=debug msg="event published" module="containerd/tasks" ns=k8s.io topic="/tasks/exec-started" type=containerd.events.TaskExecStarted
time="2017-12-05T04:05:10Z" level=debug msg="event published" module="containerd/events" ns=k8s.io topic="/tasks/exit" type=containerd.events.TaskExit
I’ve seen this twice today. Will update more information if I saw it once more.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 33 (23 by maintainers)
@pires The fix for this should be in containerd 1.0.2.
I’ve seen “runc hung on exec.fifo” a lot in the past when working with linuxkit and/or the swamrd port. It’s been a long while since the last time I saw it but IIRC (and sorry to be quite vague here, my recollection is rather fuzzy passed the “exec.fifo was involved” level, one set of notes I found says just “issues with ttys” 😉) it often turned out to be something preventing the eventual init process from getting to launching, missing libraries or inability to perform some action on the in-container bring up path.
I would often find that various useful debug was being swallowed from stdio somewhere along the line. I have a (long bitrotted) hack in a branch which is:
Which was often invaluable. Something similar was originally part of #1002 but I think it got removed/reworked.
I’ve also resorted to
strace -o foo -fff -s 4096 -p $(pidof containerd)
in the past and starting the problematic container, then tracing the execs back from the foo.NNN corresponding to the hung runc through the 3 phases, often you find awrite(1, "some smoking gun")
in there somewhere, or some other sort of obvious error like anopen
returningEPERM
orEINVAL
etc (since the three phases ofrunc
startup are short and linear its pretty easy to follow thestrace
, unlike most Go programs).Perhaps it is worth rebasing my above hack and seeing if it leaves some clues somewhere or maybe #1002 ended up logging that output somewhere else?