moby: Make Docker ps more robust to problem containers

As it stands there are quite a few long standing issues with docker ps hanging.

https://github.com/docker/docker/issues/25993 https://github.com/docker/docker/issues/12606

and so on.

At the root one of the biggest issues is that if a single container is blocking, you can no longer iterate through the container list:

can reducePSContainer be made more robust so it can skip a container if it is locked for more than N seconds and prints out the name of the problem container.

This would make it significantly easier to debug this issue and isolate the underlying cause.

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 28 (18 by maintainers)

Most upvoted comments

Indeed I am. Still some issues to work out but it’s coming along. https://github.com/cpuguy83/docker/compare/container_memdb

cpuguy83 on Nov 9, 2016

Basic idea is to strip the container object down to configuration only and move runtime state (e.g. IO streams, log driver, etc) to a separate object. The container object will be on a transactional in-memory store where when requested the daemon gives copies of the container object rather than a pointer reference, which means there is no longer a need to lock this object. It can be used for reporting at any time without risk of blocking.

State manipulation actions (such as start/stop) will still require a lock, but not on the container object itself. This ensures the daemon can still report status, and potentially even warn the user of a problem.

cpuguy83 on Nov 9, 2016

@SamSaffron we are on v1.12.6, but the bug causing the lockup during starts in containerd hasn’t been released yet (it’s on the v1.13.x tree, but not in v1.13.1-rc1): docker/containerd#436

On the docker side, it seems valid to hold a lock during (*Daemon).containerStart(). I’m not sure that releasing the lock on a timeout would be safe (a container could be orphaned in containerd in a weird state?), so it looks like the fix on the docker daemon side is not requiring locks for each container in (*Daemon).reducePsContainer().

fabiokung on Feb 7, 2017

I think we should pull in the “don’t hold container lock on exec exit” fix for a hypothetical 1.12.x at the very least.

cpuguy83 on Nov 30, 2016

On master I don’t think we are even taking the container lock here, which seems to make sense.

cpuguy83 on Nov 28, 2016