moby: [BUG] Docker-proxy binds on ports but no container is running (anymore)

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:02:53 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:02:53 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 93
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 211
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.748 GiB
Name: motocom.2hg.org
ID: O4YH:4KEL:GEY3:C26M:QWMN:PCLK:RF6E:DQSG:XPWG:SSBE:FVE3:GWBL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): This is a dedicated server running Debian Jessie. Everything up to date.

# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
# netstat -ap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 *:3389                  *:*                     LISTEN      890/openvpn
tcp        0      0 localhost:mysql         *:*                     LISTEN      1276/mysqld
tcp        0      0 *:ssh                   *:*                     LISTEN      727/sshd
tcp        0    336 xxx:ssh     ip1f10d8b9.dynami:60637 ESTABLISHED 1641/sshd: xxx
tcp        0      1 xxx:ssh     116.31.116.52:53774     FIN_WAIT1   -
tcp6       0      0 [::]:smtp               [::]:*                  LISTEN      6346/docker-proxy
tcp6       0      0 [::]:5280               [::]:*                  LISTEN      6366/docker-proxy
tcp6       0      0 [::]:imaps              [::]:*                  LISTEN      6316/docker-proxy
tcp6       0      0 [::]:5443               [::]:*                  LISTEN      6356/docker-proxy
tcp6       0      0 [::]:xmpp-client        [::]:*                  LISTEN      6386/docker-proxy
tcp6       0      0 [::]:submission         [::]:*                  LISTEN      6326/docker-proxy
tcp6       0      0 [::]:imap2              [::]:*                  LISTEN      6336/docker-proxy
tcp6       0      0 [::]:xmpp-server        [::]:*                  LISTEN      6376/docker-proxy
[...]

Steps to reproduce the issue:

  1. I upgraded from version 1.12.0 to 1.12.1.
  2. I tried to restart my containers but they failed to allocate ports.
  3. The ports are allocated by docker-proxy but no container is running.

Describe the results you received: I can not start the containers because the ports are already in use.

Describe the results you expected: Docker-proxy does not allocate ports docker does not use.

Additional information you deem important (e.g. issue happens only occasionally):

I already tried to restrict docker to ipv4 but this was not changing anything. I also tried to restart docker/server but docker also allocates many ports. I use docker-compose to manage my containers if this matters.

Are there more information needed? Is there a quick workaround available make docker think that the ports are not needed anymore?

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 29
  • Comments: 77 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Well, I’ve found a workaround:

  • stop docker
  • remove all internal docker network: rm /var/lib/docker/network/files/
  • start docker
docker ps -a | awk '{print $1}' | docker rm -f

is it really necessary ?

Like @rdavaillaud this workaround work for me:

systemctl stop docker
rm -rf /var/lib/docker/network/files
systemctl start docker

I am getting this basically weekly.

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Why is this marked closed?

@cpuguy83 When I started the docker daemon it starts up the docker-proxy processes. For ports where no container exists anymore that uses these ports. Trying to kill the docker-proxy processes doesn’t help because they do not exit and ends up in a zombie process state. So they do not release the ports.

So I had no way to use the ports again while docker was running.

Well, I’ve found a workaround:

  • stop docker
  • remove all internal docker network: rm /var/lib/docker/network/files/
  • start docker

more than five years later, this is still the only available workaround with no solution. That’s interesting, to say the least.

This issue is a side-effect of the changes added to support the --live-restore feature.

From docker 1.12.0 onward, when the bridge network driver comes up, it will restores the bridge network endpoints it finds in the store. While doing this, it also restores the port bindings associated with the endpoint, if any.

Note:

  • Under normal condition at daemon boot, no endpoints are present in the store.
  • If stale endpoints are present (this is usually the case of an ungraceful shutdown of the daemon with running containers), they are expected to be removed during boot as part of the stale sandbox cleanup process run by libnetwork core.
  • If endpoints are present because of the live-restore, they will not be removed because the sandbox cleanup will not happen for the containers which are running.

The issue seems like the sandbox for stale endpoints from older docker version run is not present, therefore libnetwork core does not invoke the cleanup of the stale endpoints with the driver.

I believe the stale endpoints issue can be fixed by removing the networks and restarting the daemon. Because during the bridge endpoint restore, the endpoint is discarded and removed from store if the corresponding network is missing.

If the above does not work, one solution is to manually remove the problematic docker/network/v1.0/bridge-endpoint/<id> key value from the store. I just found out this cli tool to browse and modify a boltdb store, but did not have much luck using it so far (https://github.com/br0xen/boltbrowser).

Otherwise, last resort is to remove the /var/lib/docker/network/files/local-kv.db file, before starting the daemon.

On a side note, there is also a bug which will cause this issue. It is explained and fixed in https://github.com/docker/libnetwork/pull/1504 and will be available in next release.

We appear to be experiencing this same problem.

docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:11:10 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:11:10 2016
 OS/Arch:      linux/amd64

docker info:

Containers: 69
 Running: 55
 Paused: 0
 Stopped: 14
Images: 253
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 696
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 47.16 GiB
Name: container2
ID: CC32:U2QZ:HQTG:KK45:NYWG:LHF4:HVSE:UTEA:RYTR:3E4L:OS5S:MEVI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

After restarting the Docker daemon (in order to fix a presumably unrelated problem with an unstoppable container) on an instance with 70 containers, approximately 44 of which have exposed ports, four of the containers were unable to start (even after deleting & recreating them) because their ports were in use by docker-proxy instances. I tried killing one of the instances, but it just became a zombie, which Docker still hasn’t reaped after about 10 minutes.

I did below command. It works for me. systemctl stop docker; systemctl start docker

Coworker had this exact problem, changing fs driver aufs->devicemapper helped. Removing stuff from /var/ib/docker didn’t help.

I’m having the same problem on docker for Mac.

docker info
Containers: 27
 Running: 4
 Paused: 0
 Stopped: 23
Images: 250
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 310
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.19-moby
Operating System: Alpine Linux v3.4
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 2.934 GiB
Name: moby
ID: 2STL:DVY7:P6IW:M2IO:I35K:MP3R:LCE7:NPJQ:IQYN:5OSY:GHLJ:6VUR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 66
 Goroutines: 82
 System Time: 2016-09-08T16:36:04.267966976Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8

But I have no /var/lib/docker to clean up 😦 (I have used the “reset to factory settings” of docker for mac and thus lost all images, but the error is gone)

after delete local-kv.db and recreate all defined netorks, services is now back online.

just happen, restart service/reboot host machine did not work. only docker service. no “docker-compose” used.

error logs:

Jul 06 12:50:22 x86 docker[13475]: docker: Error response from daemon: driver failed programming external connectivity on endpoint webdav (e083dcf
886d1f038115ec3d6f2cc0741d9650e0b011b998f596701398d427ceb): Bind for 0.0.0.0:7280 failed: port is already allocated.
# ss -tulnp | grep 7280
tcp     LISTEN   0        128                    *:7280                 *:*      users:(("docker-proxy",pid=819,fd=4)) 
# docker info
Client:
 Debug Mode: false

Server:
 Containers: 11
  Running: 10
  Paused: 0
  Stopped: 1
 Images: 38
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-9-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 31.33GiB
 Name: x86
 ID: OTHX:RFXS:3C3E:4CZW:Q4ZE:6ION:GB36:E75T:LVHE:QV52:WJ2U:PMCY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

@rdavaillaud 's

Well, I’ve found a workaround:

  • stop docker
  • remove all internal docker network: rm /var/lib/docker/network/files/
  • start docker

Did the trick on my end, thanks!

**UPDATE 05/23/19: Encountered the issue again, deleting just the files/local-kv.db file instead of the files/ folder accomplished the same result.

@amitkumarj441 Thanks. I will update the docker as I need a permanent solution to it.

@cpuguy thanks, I’ll do it asap.

@marcelmfs please update to 1.12.5. The above mentioned bug should be fixed there (as of 1.12.3)

I just saw this behaviour in 1.12.1, docker-compose 1.8.1.

We have some short lived containers that do bind to some ports for registration purposes, and after some runs, docker-proxy were still running referencing old runs of already stopped & removed containers, and then at some point docker reuses one of the IPs but the iptables rules are already in place and it causes all sorts of routing problems (trying to request one service reaches another).

This is really bad. It will happen in production (given that we expect docker to be a long lived process).

Confirmed temp-fix:

# systemctl stop docker; rm /var/lib/docker/network/files/local-kv.db; systemctl start docker
# docker ps -a | awk '{print $1}' | docker rm -f

Afterwards a clean env is available and containers are able to bind to their respective ports.

Hi!

I am not 100% sure because it happened a while ago, but I think I removed everything including the networks and the problem remained. But maybe I didn’t restart afterwards or something like that. Nevertheless thanks for the analysis!

removing /var/lib/docker/network/files/ worked for me.

I work with rdavaillaud and we have 4 pc using this docker and compose configuration, and this happend to only one of the pc one day after a computer reboot.

So we don’t really know how to reproduce.

@rdavaillaud @kossmoss I couldn’t recreate the issue. But I wasn’t using compose. All the reported occurrences seem to be happening with compose. Can you give the exact steps you are using that results in the issue ?

@cpuguy83 sorry for the late reply. Probably, “zombie” is not proper definition for that - I was limited in time and didn’t performed too much of investigations and can’t say how exactly we need to call those processes. My way to look on currently opened ports is sudo netcat -tunlp command or sudo service docker status. In both cases, after docker daemon started, I see those docker-proxy processes occupying the ports, but in the same time see no any running docker containers in docker ps output.

When I kill docker-proxy process, it actually destroyed and port is not occupied until next docker daemon restart.

When docker daemon is stopped, there’s no any running docker-proxy processes. They only running when docker daemon is running. Seems they are starting automatically at some point of docker daemon starting procedure. This also seems like docker holds those processes’ state somewhere when stopped and brings them live when started, but does not link those processes to any running docker containers.

Maybe this problem caused by starting containers via docker-compose instead of simple docker run? @hopeseekr are you using docker-compose?

Hey @nagua,

It isn’t really possible to simply close a port from outside the application that opened the socket listening on it. The only way to do this is to completely kill the process that owns the port. Then, in about a minute or two, the port will become available again for use. Here’s what’s going on (if you don’t care, skip to the end where I show you how to kill the process owning a particular port):

Anyway, here’s how to kill a process that owns a particular port:

sudo netstat -ap | grep :<port_number>

That will output the line corresponding to the process holding port . Then, look in the last column, you’ll see /. Then execute this:

kill <pid>

For more info: #6675 and #6682