moby: Intermittent failure connecting to port

If I run a global nginx (or tomcat, doesn’t matter) service in a swarm, exposing a port on each node (I am load balancing to my cluster and need the same port open everywhere)

docker service create --name my-nginx --with-registry-auth --mode global -p 7011:7011 -p 7010:7010 my-nginx:0.3

The service starts on every node fine. (yay)

On any node I try to connect to the port and sometimes it works # nc -zv localhost 7010 Connection to localhost 7010 port [tcp/http-alt] succeeded!

and sometimes it doesn’t"

# nc -zv localhost 7010
nc: connect to localhost port 7010 (tcp) failed: Connection refused

If I curl, I get similar results, but sometimes it seems to timeout.

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Output of docker info:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64
root@ip-172-31-35-10:/home/ubuntu/setup/nginx# docker info
Containers: 12
 Running: 4
 Paused: 0
 Stopped: 8
Images: 12
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Backing Filesystem: extfs
 Dirs: 73
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: active
 NodeID: bd2rri2d7heioizz6ti7fs14o
 Is Manager: true
 ClusterID: 6poe5sl596j8c233f77da4vjt
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 172.31.35.10
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 4.4.0-31-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.795 GiB
Name: ip-172-31-35-10
ID: MUUA:UEUC:CC7V:WNU7:6SCE:JLD4:ULVI:YYH3:6R3D:EZI4:XIGW:F3QX
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

I am on AWS

Steps to reproduce the issue:

  1. Deploy global service opening port N 2.try to connect to port N on any box (it fails intermittently)

Describe the results you received:

# nc -zv localhost 7010
Connection to localhost 7010 port [tcp/http-alt] succeeded!
# nc -zv localhost 7010
nc: connect to localhost port 7010 (tcp) failed: Connection refused

Describe the results you expected:

# nc -zv localhost 7010
Connection to localhost 7010 port [tcp/http-alt] succeeded!

All the time

Additional information you deem important (e.g. issue happens only occasionally):

it happens every time I start a service, but intermittently. Maybe 50% failures.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 40 (14 by maintainers)

Most upvoted comments

@mrjana Running 1.12.1, can confirm issues happen without any nodes being restarted. Restarting the daemon after the issue occurs helps sometimes, though.

Ah. Missed the fact that your using Docker for AWS. There would be an update to Docker for AWS with a 1.12.2-rc1 engine version shortly. Stay tuned.