compose: docker-compose hangs
Description of the issue
In a new server installation (on a VPS provided by OVH) running Ubuntu 18.x the Docker system works flawlessy and fast but docker-compose sometimes hangs indefinitely for no reason. And for every command, from version to ps, up, logs and so on.
I tried everything regarding DNS and name resolution, even disabling systemd-resolved.
When it works it is fast. When it hangs I can see the /tmp/_Mxyz123123
directory created.
Running with verbose hangs before any output. And then completes flawlessy.
Context information (for bug reports)
Output of docker-compose version
docker-compose version 1.24.0, build 0aa59064
docker-py version: 3.7.2
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
Output of docker version
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:43:57 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
Output of docker-compose config
(Make sure to add the relevant -f
and other flags)
... not related to a specific docker-compose file
Steps to reproduce the issue
- docker-compose <command>
Observed result
Sometimes hangs
Expected result
Run fast.
Stacktrace / full error message
No stacktrace but i managed to strace the behavior.
[...]
openat(AT_FDCWD, "/tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile> <0.000038>
fstat(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000020>
write(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, "# Generated automatically from M"..., 69632) = 69632 <0.000130>
fchmod(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, 0700) = 0 <0.000050>
write(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, "type.o -o Modules/xxsubtype$(E"..., 43) = 43 <0.000029>
close(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>) = 0 <0.000021>
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2088645a10) = 3110 <0.000276>
rt_sigaction(SIGINT, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 <0.000051>
rt_sigaction(SIGKILL, {sa_handler=0x404840, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 0x7ffe82f7d900, 8) = -1 EINVAL (Invalid argument) <0.000031>
rt_sigaction(SIGTERM, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 <0.000028>
wait4(-1,
[... here it hangs and then after 114s it continues output ...]
wait4(-1, docker-compose version 1.24.0, build 0aa59064
docker-py version: 3.7.2
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3110 <114.945565>
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3110, si_uid=0, si_status=0, si_utime=43, si_stime=6} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 8) = 0 <0.000094>
rt_sigaction(SIGKILL, {sa_handler=SIG_DFL, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 0x7ffe82f7d900, 8) = -1 EINVAL (Invalid argument) <0.000150>
[...]
Additional information
Linux vps[omit] 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Ubuntu 18.04.2 LTS
docker-compose installed with curl
I have lot of other docker-compose environments working and this is the first time I see this behavior. Any idea on what to check or where to look?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 41
- Comments: 51
Commits related to this issue
- Debugging if entropy is an issue https://github.com/docker/compose/issues/6678 — committed to aminvakil/self-hosted by aminvakil 2 years ago
Same behavior for me as well. I’m running on a Digital Ocean droplet running Ubuntu 18.04.2 LTS with 4.15.0-48-generic x86_64 GNU/Linux kernel. Installed via curl today.
I want to also confirm that installing
haveged
fixed our problem.Using Digital Ocean’s Droplet, with:
We used this article to install
haveged
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-havegedI hope this helps anyone just in case.
No problem, it makes sense as it’s a new VPS with very little use. And installing
haveged
fixes the problem.Not long before the
/dev/random
dance I described I see references tocryptography.hazmat.bindings._openssl.so
in the traces.With that knowledge this comment seems awfully telling: https://github.com/pyca/cryptography/blob/master/src/_cffi_src/openssl/src/osrandom_engine.c#L123...L127
Which in turn seems to derive from: https://github.com/pyca/cryptography/issues/4645 https://github.com/pyca/cryptography/pull/4656
So I feel pretty sure in declaring that this isn’t a compose issue as such, but some combination of Python’s behaviour and lack of entropy on some systems. Installing haveged or some other entropy daemon or otherwise ensuring you have sufficient entropy seems like the right answer.
I’m going to close since this does not appear to be a bug in compose, thanks for your help in getting to the bottom of the issue.
FWIW https://lwn.net/Articles/693189/ is an interesting article on a related subject (I’m not sure whether or not this issue is the one being described there, but the symptoms are certainly similar).
Why should “docker-compose --help” require system entropy?
So your process tree is:
The two processes 29718 and 29719 execute fully in <1s. e.g. dc.29718 is
Which I think rules out something slowing ldconfig down, at least in your case.
It’s the two main processes which take some time:
In
dc.29714
(the parent) there is a gap of ~48s after 10:34:01.454543:Which seems to be the parent waiting for the child.
In the child (
dc.29717
) the gap is after 10:34:01.454593:Aha! So the process is blocked in a
poll
syscall (poll([{fd=3, events=POLLIN}], 1, -1)
) where the only passed fd is fd 3, which is a recently opened fd onto/dev/random
(openat(AT_FDCWD, "/dev/random", O_RDONLY) = 3
).The
/dev/random
device is well known for blocking when there is insufficient entropy on the system. And VPSs/cloud VMs are well known for having a general shortage of entropy.There is no use of
/dev/random
in compose (outside of a single unrelated test case), nor is there indocker-py
, so this must be in one of our dependencies.Although I cannot reproduce the hang I can see the use of
/dev/random
in my own strace (my system must have plenty of entropy I guess).The maddest bit is that after blocking waiting for
/dev/random
to become readable it immediately closes it and opens/dev/urandom
(the non-blocking one) instead and reads from that.Well, thank you for those logs, they have at least given me a string to pull on and see where it takes me…
This happens to me as well, running on a Google Compute Engine instance with the same docker versions as above. Also Ubuntu 18.04. Even docker-compose --help does this, so no docker-compose.yaml is involved.
Yeah! I had number 6 in
entropy_avail
, after Haveged installation number increased to ~2500 and nowdocker-compose
commands are running pretty fast.Very strange issue, @ijc thank you for your investigation that issue was so unexpectable but interesting to discover.
Installing haveged fixed this problem for me.
CentOS 7 docker 19.03.5 docker-compose 1.25.0
This isn’t really fixed, you guys are just giving workarounds and patches… like the issue is still happening even on the latest version.
I don’t mean anything in my post, but there’s a difference between a closed issue and one that is hanging because the proper solution hasn’t been found yet.
Even if the author got satisfied, it doesn’t mean that we should ignore this issue. no?
Dont get me wrong.
FWIW, installing jitterentropy-rngd on a small server running Ubuntu 20.04 (via
apt install jitterentropy-rngd
) resolved this problem for me.Now
cat /proc/sys/kernel/random/entropy_avail
shows a reasonable amount of entropy available (even after a fresh boot) and the hangs fromdocker-compose --help
ordocker-compose --version
are significantly reduced.@sebastialonso As a recap on this issue. The problem is that Docker requires system entropy. Sometimes the OS is not providing enough “natural entropy” for a number of reasons. In these cases you need to “fix the OS” generating more entropy and that’s what
haveged
is built for. Without that, Docker waits for enough entropy and that depends on what the OS is doing in the background.@xcash I’m running on
Docker version 18.09.6, build 481bc77
and it’s still happening.Ubuntu 18.04.2LTS x64 with everything up to date
as a work around: you must call
docker-compose
command once and <hit enter> after that issue your desired command.I am still having the same problem, with upgraded everything, but can confirm that running
docker-compose
on its own does seem to fix it.@sebastialonso After reading this thread I solved it by installing
haveged
(https://github.com/docker/compose/issues/6678#issuecomment-494398740) on my VPS to get more entropy.Here is another link on the same subject: https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
@ijc congrats for the troubleshoot Ian! 😃 It makes sense even in my setup as the problem disappeared after that VPS went in production and thus started creating entropy.
While this is a system issue, should docker compose do some reasonable check of available entropy, and issue a warning at least?
As is, running a compose command locks the system. So even though it’s not a bug in compose, Trying to use compose is a killjoy.
If this was a problem for you please upvote https://github.com/docker/compose/issues/6931 so we can get better docs, which would have avoided lots of wasted time chasing down this problem.
Hi @M034B5
This is a typical “won’t fix” case since this is not an issue in Docker or Compose. It’s a general system entropy issue. If a software needs entropy and the system is not providing enough it’s an OS issue. And everything you do to bypass that is a general workaround.
Indeed. But here the proper solution is having a good amount of entropy provided by the operating system. 😃 You are searching the “proper solution” in the wrong place.
Don’t get me wrong too and I’m not the author. I’m only providing what i think is the right point of view on this issue and why it’s closed. 😃
I’m not sure I want to upload a bunch of logs to a public place when I don’t really know what’s in them.
Same problem for me, Linode with Ubuntu 18.04.2. Going back to version 1.23.0 fixed it, although I had to recreate the containers.