moby: Port publishing is broken in 17.05 Swarm Mode
Description
All the time I used command like this:
docker service create --name logstash --hostname logstash --mode replicated --endpoint-mode vip --with-registry-auth --log-driver=json-file --stop-grace-period=20s --restart-delay=20s --network onenet --publish 12203:12203/tcp --publish 12203:12203/udp docker.elastic.co/logstash/logstash:5.3.0
… to start one logstash
container and make it listen to port 12203 (tcp/udp) on all nodes. It always works as expected: process dockerd
starts to listen on this ports on all nodes in a cluster and I was able to send messages to localhost:12203
.
But after upgrading from 17.04 to 17.05 - this feature is totally broken. After running this command port 12203 is closed, and I don’t see dockerd
process listened to it. I’ve tried different variations:
12203:12203
12203:12203/tcp
12203:12203/udp
mode=ingress,target=12203,published=12203,protocol=tcp
mode=ingress,target=12203,published=12203,protocol=udp
I’ve also tried to rename container and change port number (earlier I had a problem, when port was closed, but there was a wrong record in Docker key-value storage and it thought that port is already in use) - no success. And I even don’t see any error messages in logs.
docker service ls
still reports that all is fine:
# docker service ls | grep logs
aibwagzfoq8k logstash replicated 1/1 docker.elastic.co/logstash/logstash:5.3.0 *:12203->12203/tcp,*:12203->12203/udp
Also, when I change command to mode=host,target=12203,published=12203,protocol=tcp
- I see docker-proxy
started and listened to this port:
# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::12203 :::* LISTEN 26462/docker-proxy
Looks like now I need to create global
service and run it on all nodes in cluster. But I want to get back old behavior, when one container get packets from all nodes via dockerd
and ingress network.
Output of docker version
:
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:06:06 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:06:06 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
on master:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 6
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 41
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: rus10aj9e62s5kdiqpu8rdp6t
Is Manager: true
ClusterID: hqvohft3etj4ajnkgubbnjwzp
Managers: 4
Nodes: 18
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: <ip1-here>
Manager Addresses:
<ip2-here>:2377
<ip3-here>:2377
<ip1-here>:2377
<ip4-here>:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.963GiB
Name: dmgr-01
ID: VUNH:H6FP:CO4N:O6VB:CFCG:4T32:JQQV:SIS3:UGT7:V2FK:46VS:PRKZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: filiatixbot
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Output of docker info
on worker:
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 17
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 137
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: qcwrqbby4jce906gjm1h1f3ts
Is Manager: false
Node Address: <ip1-here>
Manager Addresses:
<ip2-here>:2377
<ip3-here>:2377
<ip4-here>:2377
<ip5-here>:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 4.4.0-47-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.67GiB
Name: dwrk-service-elk-01
ID: ZEOA:4TGE:ULAT:MZMP:SDYG:I3XT:ZHWC:OLWR:LSXF:6A73:ERN6:5QIB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: filiatixbot
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Docker Swarm mode running on hybrid infrastructure with 4 master nodes and ~10 workers. Upgraded from 17.04.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 39 (8 by maintainers)
I’m having a similar issue with a 4 (ubuntu) nodes swarm running about 60 services (mainly logstash, influx, grafana, ngnix in vip) after upgrading 17.04 to 17.05 - for some services the published port is not always available on some nodes. Redeploying service or constraining it to a node won’t solve it, rolling restart of the swarm makes it right again. [later edit] had to downgrade to 17.04, re-create the swarm - as the issue cascaded to the point where most services were unreachable.
I had the same problem after upgrading from 17.03 to 17.05. Like @one1zero1one I had to completely destroy the swarm and re-create it from scratch. When it was still broken, I could see that the
DOCKER-INGRESS
filter chain in iptables was even empty. I tried manually inserting suitable rules, but that wasn’t enough to get ingress working again.(FYI, many of the services in my swarm had been manually created and I didn’t know a way to export the service definitions. So I wrote a tool which dumps the services of a running swarm into docker-compose v3.2 files which are ready for deploying into a new swarm via
docker stack deploy
. That helped enormously in getting back running. Will publish the tool soon!)I have issues with ingress to.
It I use mode=host, it works:
I use firewalld but it is disabled.
Server Version: 17.12.1-ce
EDIT: I am unable to reproduce this on 2 KVM virtual machines… I think the issue is realted to the hosting plateform (scaleway, non standard kernel).
Same here with 18.01 and ubuntu 16.04, updated will be much appreciated.
Last night I’ve dropped my cluster, re-created it and re-deployed all services with same playbooks - port publishing works as expected.
I think this was my last upgrade. Each new release brings more problems to system.