moby: Creating new service on swarm doesn't publish ports on Centos7

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 5
 Running: 5
 Paused: 0
 Stopped: 0
Images: 6
Server Version: 1.12.1
Storage Driver: overlay2
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null overlay host
Swarm: active
 NodeID: 4aw23p0m4s1hazwr3jw3h5lr1
 Is Manager: true
 ClusterID: dowu6j9g030etwhzxd1rzv5b7
 Managers: 4
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.16.130.111
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.7.0-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 62.81 GiB
Name: www5.strippeddomain.com
ID: SSGC:WWQK:QTW2:T3OC:GAQW:RCZ7:YXAY:I5HY:XDWK:XIOS:BZWE:RTG6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): physical machines with dual network cards. one for external, and one for internal (cross-server communication) networks

Steps to reproduce the issue: these steps are taken directly from Docker Online webinar 42

  1. Clean install CentOS7 or any version up to current CentOS Linux release 7.2.1511 (Core)
  2. install docker engine 1.12.*
  3. docker network create -d overlay collabnet
  4. docker service create --name wordpressdb1 --network collabnet -e MYSQL_ROOT_PASSWORD=mysql123 -e MYSQL_DATABASE=wordpress --replicas 2 mysql:latest
  5. docker service create -e WORDPRESS_DB_HOST=wordpressdb1 -e WORDPRESS_DB_PASSWORD=mysql123 --network collabnet --replicas 3 --name wordpressapp --publish 81:80/tcp wordpress:latest

Describe the results you received: Containers for all services with all replicas get spun up correctly, but port 81 does not become accessible

Describe the results you expected: Expected to be able to access port :81 on any of the nodes within the swarm cluster to be able to access wordpress interface, but none of the nodes had this port exposed.

Additional information you deem important (e.g. issue happens only occasionally): I suspect some of the things, not sure if any of them make sense:

  1. it looks like ipv6 is enabled, and possibly somehow interfering with ipv4 port assignment
  2. it’s possible that iptables on centos is the culprit
  3. i basically cannot start any of the containers that would have port published, none of them will publish the port
  4. while docker service won’t publish ports, regular docker run continues to publish ports for newly spun up services correctly
  5. according to reply via email - @mgoelzer confirms that this is a bug:

If I’m understanding correct, a service started like docker service create -p 12345:12345 whatever-image /some/command/that/listens/on/12345 does not expose :12345 on all 8 of your nodes, right? Yes, that is definitely a bug. Can you open an issue about it on docker/docker and @-mention me? (@mgoelzer)

Possibly the issue has to do with iptables because of this:

systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-09-01 01:36:25 UTC; 5h 34min ago
 Main PID: 18703 (firewalld)
   Memory: 32.7M
   CGroup: /system.slice/firewalld.service
           └─18703 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j DOCKER' failed: iptables: No chain/target/match by that name.
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -j DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.19.0.0/16 ! -o docker_gwbridge -j MASQUERADE' failed: iptables: No chain/target/match by that name.
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t nat -C DOCKER -i docker_gwbridge -j RETURN' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -D FORWARD -i docker_gwbridge -o docker_gwbridge -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -o docker_gwbridge -j DOCKER' failed: iptables: No chain/target/match by that name.
Sep 01 01:36:26 www5.domain123.com firewalld[18703]: 2016-09-01 01:36:26 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 3
  • Comments: 56 (5 by maintainers)

Most upvoted comments

yes, confirmed. with firewalld disabled everything seems to work ok

Comment from @outofcoffee was spot on. it didnt’ do the trick for me but it got me looking in the direction of firewalld zones and what i found was the following:

  1. needed to assign proper interfaces to appropriate zones:
    1. trusted (active) interfaces: docker0 docker_gwbridge eth1
    2. public (active) interfaces: eth0
  2. had to look into --permanent vs non-permanent diffs, found that for some reason had to configure all interfaces for proper zones,although I assumed that --permanent is the one that sticks after the reboot, but in my case, for some reason non-permanent was reloaded alongside with mismatched --permanent zone rules.

now things seem to work, but i’m afraid to switch production to start using docker services since firewalld is not something that i feel comfortable with debugging.

I guess the issue is somehow related to the zones, would be great if docker could sample all needed zone rulesets, and at this point I’m not really clear on how it would need to be setup if the server had only 1 network interface in it … super confused, off to read firewalld manuals

We are continuing to see this issue with 1.12.2-rc1 on Debian jessie with a 3.18.21 kernel.

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.2-rc1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 3
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: active
 NodeID: bms42xsdsbv3lahs1i5ynpr8k
 Is Manager: true
 ClusterID: 5ctkuppqmls0okmypggfbni3s
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.1.251
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.18.21
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 256 MiB
Name: test01
ID: MMHP:NEYA:2UFU:PQPQ:QIRR:VRD4:3TLW:N6GX:76ZE:GESH:OKYM:LZ4U
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

# docker swarm init
Swarm initialized: current node (c4qb0ch0n52nd7hxkn8zg49ez) is now a manager.
...

# docker service create --name my_web --replicas 3 --publish 8080:80 nginx
bee8dzp9tgwvoulw7l28y3cna

# docker service ls
ID            NAME    REPLICAS  IMAGE  COMMAND
9b4o5ngf7xx1  my_web  0/3       nginx

A broken service will have these errors logged:

Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.659574742Z" level=error msg="Failed to add ingress: failed to find gateway bridge interface name for <nil>: numerical result out of range"
Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.659812688Z" level=error msg="Failed to create real server 10.255.0.7 for vip 10.255.0.4 fwmark 2865 in sb ingress-sbox: no such process"
Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.691847913Z" level=warning msg="Could not rollback container connection to network ingress"
Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.706289781Z" level=info msg="Failed to delete real server 10.255.0.7 for vip 10.255.0.4 fwmark 2865: no such process"
Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.706326454Z" level=error msg="Failed to delete a new service for vip 10.255.0.4 fwmark 2865: no such process"
Sep 27 20:58:30 test01 dockerd[99]: time="2016-09-27T20:58:30.713753137Z" level=info msg="setting up rule failed, [-t nat -D DOCKER-INGRESS -p tcp --dport 8080 -j DNAT --to-destination <nil>:8080]:  (iptables failed: iptables --wait -t nat -D DOCKER-INGRESS -p tcp --dport 8080 -j DNAT --to-destination <nil>:8080: iptables v1.4.21: Bad IP address \"<nil>\"\n\nTry `iptables -h' or 'iptables --help' for more information.\n (exit status 2))"

Only a single entry will be in DOCKER-INGRESS chain (i.e. no DNAT entry)

# iptables -t nat -L DOCKER-INGRESS -n -v
Chain DOCKER-INGRESS (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    1    64 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0 

A curl to the local service will fail:

# curl http://127.0.0.1:8080
curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused