moby: Multi-host overlay networking doesn't work in Centos7, Docker 1.12.1

It seems like (on AWS), t2.micro nodes in a VPC, common subnet, common security group, multihost networking via an overlay network is broken (for me). These were freshly deployed and updated to latest manually, and had docker installed via the yum repo.

Output of docker version:

Manager:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Worker:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Manager:

[root@ip-10-1-4-58 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@ip-10-1-4-58 ~]# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.1
Storage Driver: devicemapper
 Pool Name: docker-202:1-16841715-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 16.91 MB
 Data Space Total: 107.4 GB
 Data Space Available: 7.172 GB
 Metadata Space Used: 585.7 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay bridge null host
Swarm: active
 NodeID: 11eunhpds5mxzacx1cmxch24t
 Is Manager: true
 ClusterID: 1ua5zojygfm6yvuayjc3mv962
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.1.4.58
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 991.6 MiB
Name: ip-10-1-4-58.aws.npage.internal
ID: 5SWD:Y67L:3TVI:35K6:M4AI:BRQO:SYBM:4OUJ:BTGK:N4A7:HBUR:GLTD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
[root@ip-10-1-4-58 ~]# uname -r
3.10.0-327.28.3.el7.x86_64
[root@ip-10-1-4-58 ~]# /sbin/iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER-ISOLATION  all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (2 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Worker:

[root@ip-10-1-4-59 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@ip-10-1-4-59 ~]# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.1
Storage Driver: devicemapper
 Pool Name: docker-202:1-16841715-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 16.91 MB
 Data Space Total: 107.4 GB
 Data Space Available: 7.142 GB
 Metadata Space Used: 585.7 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: active
 NodeID: 4wpobknxmff4j4jairuisu5vi
 Is Manager: false
 Node Address: 10.1.4.59
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 991.6 MiB
Name: ip-10-1-4-59.aws.npage.internal
ID: 7OKG:T7XY:YHLB:UAIP:3JS7:WCXZ:HDP4:6NNL:XJQT:BTUZ:UJMO:NJ4W
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
[root@ip-10-1-4-59 ~]# uname -r
3.10.0-327.28.3.el7.x86_64
[root@ip-10-1-4-59 ~]#
[root@ip-10-1-4-59 ~]# /sbin/iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER-ISOLATION  all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (2 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Additional environment details (AWS, VirtualBox, physical, etc.):

Pair of t2.micro instances in a common subnet in a common vpc sharing a security group that allows all ports in and out.

Steps to reproduce the issue:

  1. Create the Swarm, Join a worker to the manager
[root@ip-10-1-4-58 ~]# docker node ls
ID                           HOSTNAME                             STATUS  AVAILABILITY  MANAGER STATUS
11eunhpds5mxzacx1cmxch24t *  ip-10-1-4-58.aws.npage.internal  Ready   Active        Leader
4wpobknxmff4j4jairuisu5vi    ip-10-1-4-59.aws.npage.internal  Ready   Active
[root@ip-10-1-4-58 ~]# docker node inspect 11eunhpds5mxzacx1cmxch24t
[
    {
        "ID": "11eunhpds5mxzacx1cmxch24t",
        "Version": {
            "Index": 10
        },
        "CreatedAt": "2016-09-05T20:04:28.683647918Z",
        "UpdatedAt": "2016-09-05T20:04:28.730611367Z",
        "Spec": {
            "Role": "manager",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "ip-10-1-4-58.aws.npage.internal",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 1000000000,
                "MemoryBytes": 1039716352
            },
            "Engine": {
                "EngineVersion": "1.12.1",
                "Plugins": [
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            }
        },
        "Status": {
            "State": "ready"
        },
        "ManagerStatus": {
            "Leader": true,
            "Reachability": "reachable",
            "Addr": "10.1.4.58:2377"
        }
    }
]
[root@ip-10-1-4-58 ~]# docker node ls
ID                           HOSTNAME                             STATUS  AVAILABILITY  MANAGER STATUS
11eunhpds5mxzacx1cmxch24t *  ip-10-1-4-58.aws.npage.internal  Ready   Active        Leader
4wpobknxmff4j4jairuisu5vi    ip-10-1-4-59.aws.npage.internal  Ready   Active
[root@ip-10-1-4-58 ~]# docker node inspect 4wpobknxmff4j4jairuisu5vi
[
    {
        "ID": "4wpobknxmff4j4jairuisu5vi",
        "Version": {
            "Index": 16
        },
        "CreatedAt": "2016-09-05T20:05:04.264151429Z",
        "UpdatedAt": "2016-09-05T20:05:04.287774376Z",
        "Spec": {
            "Role": "worker",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "ip-10-1-4-59.aws.npage.internal",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 1000000000,
                "MemoryBytes": 1039716352
            },
            "Engine": {
                "EngineVersion": "1.12.1",
                "Plugins": [
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            }
        },
        "Status": {
            "State": "ready"
        }
    }
]
  1. Create an overlay network (with an IP space that doesn’t overlap just in case)
[root@ip-10-1-4-58 ~]# docker network create --driver overlay --subnet 192.168.1.0/24 testnet
5inx7s3lv18th4v9yiny7q0rt
[root@ip-10-1-4-58 ~]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
8b504cddad1c        bridge              bridge              local
8cac0c681e4f        docker_gwbridge     bridge              local
3c458f987de8        host                host                local
cnfstsj9bfsv        ingress             overlay             swarm
b5650f9a0a03        none                null                local
5inx7s3lv18t        testnet             overlay             swarm
[root@ip-10-1-4-58 ~]# docker network inspect testnet
[
    {
        "Name": "testnet",
        "Id": "5inx7s3lv18th4v9yiny7q0rt",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "192.168.1.0/24",
                    "Gateway": "192.168.1.1"
                }
            ]
        },
        "Internal": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "257"
        },
        "Labels": null
    }
]
  1. Create busybox services
[root@ip-10-1-4-58 ~]# docker service ls
ID  NAME  REPLICAS  IMAGE  COMMAND
[root@ip-10-1-4-58 ~]# docker service create --name busy1 --network testnet busybox sleep 5000
f369m2axqnzqqd03a7syb8bfi
[root@ip-10-1-4-58 ~]# docker service create --name busy2 --network testnet busybox sleep 5000
3p762eve0w7eryyv2lo90u5pa
[root@ip-10-1-4-58 ~]# docker service ls
ID            NAME   REPLICAS  IMAGE    COMMAND
3p762eve0w7e  busy2  1/1       busybox  sleep 5000
f369m2axqnzq  busy1  1/1       busybox  sleep 5000
[root@ip-10-1-4-58 ~]# docker service inspect busy1
[
    {
        "ID": "f369m2axqnzqqd03a7syb8bfi",
        "Version": {
            "Index": 44
        },
        "CreatedAt": "2016-09-05T20:14:02.880198898Z",
        "UpdatedAt": "2016-09-05T20:14:02.881754501Z",
        "Spec": {
            "Name": "busy1",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "busybox",
                    "Args": [
                        "sleep",
                        "5000"
                    ]
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "5inx7s3lv18th4v9yiny7q0rt"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "5inx7s3lv18th4v9yiny7q0rt",
                    "Addr": "192.168.1.2/24"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]
[root@ip-10-1-4-58 ~]# docker service inspect busy2
[
    {
        "ID": "3p762eve0w7eryyv2lo90u5pa",
        "Version": {
            "Index": 52
        },
        "CreatedAt": "2016-09-05T20:14:07.288219969Z",
        "UpdatedAt": "2016-09-05T20:14:07.290038997Z",
        "Spec": {
            "Name": "busy2",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "busybox",
                    "Args": [
                        "sleep",
                        "5000"
                    ]
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "5inx7s3lv18th4v9yiny7q0rt"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "5inx7s3lv18th4v9yiny7q0rt",
                    "Addr": "192.168.1.4/24"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]
  1. Try to ping Manager:
[root@ip-10-1-4-58 ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
f6a1d8595e91        busybox:latest      "sleep 5000"        About a minute ago   Up About a minute                       busy1.1.4iz6o2bqc36epxfcgdkivtcam
[root@ip-10-1-4-58 ~]# docker exec -it f6a1d8595e91 /bin/sh
/ # nslookup busy2
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      busy2
Address 1: 192.168.1.4 ip-192-168-1-4.ec2.internal
/ # ping busy2
PING busy2 (192.168.1.4): 56 data bytes
^C
--- busy2 ping statistics ---
27 packets transmitted, 0 packets received, 100% packet loss

Worker:

[root@ip-10-1-4-59 ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
52cc44b7ab8a        busybox:latest      "sleep 5000"        About a minute ago   Up About a minute                       busy2.1.0xm1qks7yefj89rhha8bs6mmv
[root@ip-10-1-4-59 ~]# docker exec -it 52cc44b7ab8a /bin/sh
/ # nslookup busy1
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      busy1
Address 1: 192.168.1.2 ip-192-168-1-2.ec2.internal
/ # ping busy1
PING busy1 (192.168.1.2): 56 data bytes
^C
--- busy1 ping statistics ---
32 packets transmitted, 0 packets received, 100% packet loss

Describe the results you received:

Ping between containers on different hosts, but the same overlay doesn’t work.

Describe the results you expected:

I would expect connectivity to exist between hosts

Additional information you deem important (e.g. issue happens only occasionally):

The overlay communication works as expected on the same host:

Manager:

[root@ip-10-1-4-58 ~]# docker service create --name busy3 --network testnet busybox sleep 5000
7kkjhz515an3yclnz0pzcckee
[root@ip-10-1-4-58 ~]# docker service inspect busy3
[
    {
        "ID": "7kkjhz515an3yclnz0pzcckee",
        "Version": {
            "Index": 60
        },
        "CreatedAt": "2016-09-05T20:30:12.741675622Z",
        "UpdatedAt": "2016-09-05T20:30:12.743275433Z",
        "Spec": {
            "Name": "busy3",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "busybox",
                    "Args": [
                        "sleep",
                        "5000"
                    ]
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "5inx7s3lv18th4v9yiny7q0rt"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "5inx7s3lv18th4v9yiny7q0rt",
                    "Addr": "192.168.1.6/24"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]

Worker:

[root@ip-10-1-4-59 ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
0d774716b3a9        busybox:latest      "sleep 5000"        About a minute ago   Up About a minute                       busy3.1.aro8xtxieqmv3xwai18wmjvg0
52cc44b7ab8a        busybox:latest      "sleep 5000"        17 minutes ago       Up 17 minutes                           busy2.1.0xm1qks7yefj89rhha8bs6mmv
[root@ip-10-1-4-59 ~]# docker exec -it 0d774716b3a9 /bin/sh
/ # ping busy3
PING busy3 (192.168.1.6): 56 data bytes
64 bytes from 192.168.1.6: seq=0 ttl=64 time=0.032 ms
64 bytes from 192.168.1.6: seq=1 ttl=64 time=0.053 ms
64 bytes from 192.168.1.6: seq=2 ttl=64 time=0.067 ms
^C
--- busy3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.032/0.050/0.067 ms
/ # ping busy2
PING busy2 (192.168.1.4): 56 data bytes
64 bytes from 192.168.1.4: seq=0 ttl=64 time=0.096 ms
64 bytes from 192.168.1.4: seq=1 ttl=64 time=0.078 ms
64 bytes from 192.168.1.4: seq=2 ttl=64 time=0.088 ms
64 bytes from 192.168.1.4: seq=3 ttl=64 time=0.077 ms
^C
--- busy2 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.077/0.084/0.096 ms
/ # ping busy1
PING busy1 (192.168.1.2): 56 data bytes
^C
--- busy1 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 23 (6 by maintainers)

Most upvoted comments

I’m seeing the exact same issue that @fishnix reported (essentially identical repro steps, ping not working across hosts), but on Centos7 in Aliyun (Chinese provider), so it’s not an ECS-only issue. Similarly, everything works fine with two VMs on my dev machine. Fresh installs of v17.03.0-ce. Contrary to @gittycat 's steps, creating the network without a subnet doesn’t help. Rebooting all nodes and rebuilding the network doesn’t help.

Was careful to select a subnet that doesn’t overlap with any of the host’s routes, as is necessary on kernels ❤️.16. Of course Aliyun doesn’t make this easy, as they “helpfully” define routes for nearly the entire RFC1918 space:

[host:~]$ ip r
default via <snip> dev eth1 
10.0.0.0/8 via <snip2> dev eth0 
<snip2>/22 dev eth0  proto kernel  scope link  src <snip2>
100.64.0.0/10 via <snip2> dev eth0 
<snip>/22 dev eth1  proto kernel  scope link  src <snip>
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
172.16.0.0/12 via <snip2> dev eth0 
192.168.0.0/20 dev docker0  proto kernel  scope link  src 192.168.0.1 
192.168.16.0/20 dev docker_gwbridge  proto kernel  scope link  src 192.168.16.1 

…so I chose 192.168.124.0/22.

At this point I’m not sure what’s more work, switching from Centos 7 to Ubuntu 16 or making do without overlay networking (i.e. rolling my own service discovery and setting up internal load balancers where I needed VIPs).

Edit: Found a workaround, might be relevant to EC2 as well depending on how they handle VLAN traffic. Looks like Aliyun blocks VXLAN traffic between certain subnets, even if it doesn’t go onto the public internet. I used tcpdump -n -XX src host <some_other_machine> to determine that PINGs were being relayed between hosts as VXLAN traffic (not UDP or TCP!), then noticed that some hosts could communicate with others, while others couldn’t.

Workaround: create the overlay network with --opt encrypted=true . This shifts the traffic to ESP, which apparently is OK.

@adriano-fonseca just adding a +1 isn’t very constructive and unlikely helps you any further. If you meant that you have exactly the same issue, please add more details. If you have a similar issue, and suspect it’s a bug; open a new issue with as much details as possible