moby: [BUG] Docker-proxy binds on ports but no container is running (anymore)
Output of docker version
:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:02:53 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:02:53 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 6
Running: 0
Paused: 0
Stopped: 6
Images: 93
Server Version: 1.12.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 211
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.748 GiB
Name: motocom.2hg.org
ID: O4YH:4KEL:GEY3:C26M:QWMN:PCLK:RF6E:DQSG:XPWG:SSBE:FVE3:GWBL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.): This is a dedicated server running Debian Jessie. Everything up to date.
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# netstat -ap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 *:3389 *:* LISTEN 890/openvpn
tcp 0 0 localhost:mysql *:* LISTEN 1276/mysqld
tcp 0 0 *:ssh *:* LISTEN 727/sshd
tcp 0 336 xxx:ssh ip1f10d8b9.dynami:60637 ESTABLISHED 1641/sshd: xxx
tcp 0 1 xxx:ssh 116.31.116.52:53774 FIN_WAIT1 -
tcp6 0 0 [::]:smtp [::]:* LISTEN 6346/docker-proxy
tcp6 0 0 [::]:5280 [::]:* LISTEN 6366/docker-proxy
tcp6 0 0 [::]:imaps [::]:* LISTEN 6316/docker-proxy
tcp6 0 0 [::]:5443 [::]:* LISTEN 6356/docker-proxy
tcp6 0 0 [::]:xmpp-client [::]:* LISTEN 6386/docker-proxy
tcp6 0 0 [::]:submission [::]:* LISTEN 6326/docker-proxy
tcp6 0 0 [::]:imap2 [::]:* LISTEN 6336/docker-proxy
tcp6 0 0 [::]:xmpp-server [::]:* LISTEN 6376/docker-proxy
[...]
Steps to reproduce the issue:
- I upgraded from version 1.12.0 to 1.12.1.
- I tried to restart my containers but they failed to allocate ports.
- The ports are allocated by
docker-proxy
but no container is running.
Describe the results you received: I can not start the containers because the ports are already in use.
Describe the results you expected: Docker-proxy does not allocate ports docker does not use.
Additional information you deem important (e.g. issue happens only occasionally):
I already tried to restrict docker to ipv4 but this was not changing anything. I also tried to restart docker/server but docker also allocates many ports. I use docker-compose to manage my containers if this matters.
Are there more information needed? Is there a quick workaround available make docker think that the ports are not needed anymore?
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 29
- Comments: 77 (13 by maintainers)
Well, I’ve found a workaround:
is it really necessary ?
Like @rdavaillaud this workaround work for me:
I am getting this basically weekly.
Why is this marked closed?
@cpuguy83 When I started the docker daemon it starts up the docker-proxy processes. For ports where no container exists anymore that uses these ports. Trying to kill the docker-proxy processes doesn’t help because they do not exit and ends up in a zombie process state. So they do not release the ports.
So I had no way to use the ports again while docker was running.
more than five years later, this is still the only available workaround with no solution. That’s interesting, to say the least.
This issue is a side-effect of the changes added to support the
--live-restore
feature.From docker 1.12.0 onward, when the bridge network driver comes up, it will restores the bridge network endpoints it finds in the store. While doing this, it also restores the port bindings associated with the endpoint, if any.
Note:
The issue seems like the sandbox for stale endpoints from older docker version run is not present, therefore libnetwork core does not invoke the cleanup of the stale endpoints with the driver.
I believe the stale endpoints issue can be fixed by removing the networks and restarting the daemon. Because during the bridge endpoint restore, the endpoint is discarded and removed from store if the corresponding network is missing.
If the above does not work, one solution is to manually remove the problematic
docker/network/v1.0/bridge-endpoint/<id>
key value from the store. I just found out this cli tool to browse and modify a boltdb store, but did not have much luck using it so far (https://github.com/br0xen/boltbrowser).Otherwise, last resort is to remove the
/var/lib/docker/network/files/local-kv.db
file, before starting the daemon.On a side note, there is also a bug which will cause this issue. It is explained and fixed in https://github.com/docker/libnetwork/pull/1504 and will be available in next release.
We appear to be experiencing this same problem.
docker version
:docker info
:After restarting the Docker daemon (in order to fix a presumably unrelated problem with an unstoppable container) on an instance with 70 containers, approximately 44 of which have exposed ports, four of the containers were unable to start (even after deleting & recreating them) because their ports were in use by
docker-proxy
instances. I tried killing one of the instances, but it just became a zombie, which Docker still hasn’t reaped after about 10 minutes.I did below command. It works for me. systemctl stop docker; systemctl start docker
Coworker had this exact problem, changing fs driver aufs->devicemapper helped. Removing stuff from /var/ib/docker didn’t help.
I’m having the same problem on docker for Mac.
But I have no /var/lib/docker to clean up 😦 (I have used the “reset to factory settings” of docker for mac and thus lost all images, but the error is gone)
after delete local-kv.db and recreate all defined netorks, services is now back online.
just happen, restart service/reboot host machine did not work. only docker service. no “docker-compose” used.
error logs:
@rdavaillaud 's
Did the trick on my end, thanks!
**UPDATE 05/23/19: Encountered the issue again, deleting just the
files/local-kv.db
file instead of thefiles/
folder accomplished the same result.@amitkumarj441 Thanks. I will update the docker as I need a permanent solution to it.
@cpuguy thanks, I’ll do it asap.
@marcelmfs please update to 1.12.5. The above mentioned bug should be fixed there (as of 1.12.3)
I just saw this behaviour in 1.12.1, docker-compose 1.8.1.
We have some short lived containers that do bind to some ports for registration purposes, and after some runs, docker-proxy were still running referencing old runs of already stopped & removed containers, and then at some point docker reuses one of the IPs but the iptables rules are already in place and it causes all sorts of routing problems (trying to request one service reaches another).
This is really bad. It will happen in production (given that we expect docker to be a long lived process).
Confirmed temp-fix:
Afterwards a clean env is available and containers are able to bind to their respective ports.
Hi!
I am not 100% sure because it happened a while ago, but I think I removed everything including the networks and the problem remained. But maybe I didn’t restart afterwards or something like that. Nevertheless thanks for the analysis!
removing /var/lib/docker/network/files/ worked for me.
I work with rdavaillaud and we have 4 pc using this docker and compose configuration, and this happend to only one of the pc one day after a computer reboot.
So we don’t really know how to reproduce.
@rdavaillaud @kossmoss I couldn’t recreate the issue. But I wasn’t using compose. All the reported occurrences seem to be happening with compose. Can you give the exact steps you are using that results in the issue ?
@cpuguy83 sorry for the late reply. Probably, “zombie” is not proper definition for that - I was limited in time and didn’t performed too much of investigations and can’t say how exactly we need to call those processes. My way to look on currently opened ports is sudo netcat -tunlp command or sudo service docker status. In both cases, after docker daemon started, I see those docker-proxy processes occupying the ports, but in the same time see no any running docker containers in docker ps output.
When I kill docker-proxy process, it actually destroyed and port is not occupied until next docker daemon restart.
When docker daemon is stopped, there’s no any running docker-proxy processes. They only running when docker daemon is running. Seems they are starting automatically at some point of docker daemon starting procedure. This also seems like docker holds those processes’ state somewhere when stopped and brings them live when started, but does not link those processes to any running docker containers.
Maybe this problem caused by starting containers via docker-compose instead of simple docker run? @hopeseekr are you using docker-compose?
Hey @nagua,
It isn’t really possible to simply close a port from outside the application that opened the socket listening on it. The only way to do this is to completely kill the process that owns the port. Then, in about a minute or two, the port will become available again for use. Here’s what’s going on (if you don’t care, skip to the end where I show you how to kill the process owning a particular port):
Anyway, here’s how to kill a process that owns a particular port:
sudo netstat -ap | grep :<port_number>
That will output the line corresponding to the process holding port . Then, look in the last column, you’ll see /. Then execute this:
kill <pid>
For more info: #6675 and #6682