moby: Upgrade to 20.10 breaks swarm network

Description

Steps to reproduce the issue:

Install Docker 19.03 on Ubuntu 20 or CentOS 8
Init Swarm
Start some services by docker stack deploy
Upgrade docker from 19.03 to 20.10

Describe the results you received: Containers of services can’t start there is error in logs

Dec 10 06:21:03 dockerd[3160859]: time="2020-12-10T06:21:03.150920367Z" level=error msg="fatal task error" error="starting container failed: container 9f93a21ac2e3be11a65c91f3cfde555a415eea47c636bef432d5d2e4b08afff4: endpoint create on GW Network failed: failed to create endpoint gateway_f8cabe848464 on network docker_gwbridge: network 28d599d44202f2acdc85e42437332ddb41a81bd7f0622bc0724761ec9b49082a does not exist" module=node/agent/taskmanager node.id=u7qdqny1doho69k3nariuo1ru service.id=vhtg6aoyt360k7mluiwmshqf0 task.id=aqgsfcw88m5kujnrly74o4wh4

Describe the results you expected: containers are running

Additional information you deem important (e.g. issue happens only occasionally): we have two installations with this issue which happened after upgrade to 20.10 recreating services didn’t help re-initing swarm didn’t help

# docker network list
NETWORK ID     NAME              DRIVER    SCOPE
e45b9b63c4ae   bridge            bridge    local
28d599d44202   docker_gwbridge   bridge    local
2aa80dc0cc04   host              host      local
w9rpuika2x0d   ingress           overlay   swarm
62f0fb2fdf28   none              null      local

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        7287ab3
 Built:             Tue Dec  8 18:59:40 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:57:45 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 12
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: u7qdqny1doho69k3nariuo1ru
  Is Manager: true
  ClusterID: rdq2vi44m2lkz34tdow1dvip4
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 127.0.0.1
  Manager Addresses:
   127.0.0.1:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-56-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.749GiB
 Name: cloud.filesanctuary.net
 ID: JBWW:XVUE:3XW4:OQYT:HJHK:OSRV:PFHK:PFZP:S3DV:HPZ7:NYWD:OWQO
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 11
Comments: 65 (4 by maintainers)

Most upvoted comments

We also have connectivity problems in our docker swarm (3 redhat 8.3 vm nodes) The services running in the containers are not accessible using the swarm mode routing mesh but only using the explicit host ip

After some investigation, we found that the problem is related to the 4789 udp packets that docker uses to manage the requests in the swarm: these packets are dropped by the source node and they never reach the destinatation node

To resolve this issue we had to disable the following offload feature:

ethtool -K [network] tx-checksum-ip-generic off

update: similar problem https://github.com/flannel-io/flannel/issues/1279

+16

txtdevelop on Feb 26, 2021

I can confirm this is exactly the solution for at least my case (centos 8.3 stream, docker 20.10.5) on VMware esxi 6.7

ethtool -K ens192 tx-checksum-ip-generic off

after executing on all swarm machines, routing mesh now works! This seems to be reboot-safe

cheers @txtdevelop !

Edit/ps: it may be reboot-safe, but after recent dnf update the setting was lost again. For anyone needing it: (ETHTOOL_OPTS= seems not recognized in centos8-stream when using NM)

cat > /etc/NetworkManager/dispatcher.d/pre-up.d/10-tx-checksum-ip-generic <<'EOF'
ethtool -K ens192 tx-checksum-ip-generic off
EOF
chmod +x /etc/NetworkManager/dispatcher.d/pre-up.d/10-tx-checksum-ip-generic

makes it persistent

check:

ethtool --show-offload ens192 | grep tx-checksum-ip-generic

sgohl on Apr 14, 2021

Hello everyone,

I was having the exact same issue for a swarm cluster, buit of Ubuntu 20.04.4 LTS VMs, running on ESXi 6.7. I spent countless hours troubleshooting it. My main focus was iptables, since it made most sense to me.

However, in my case, running the command below on all cluster nodes immediately fixed my problem. Now, ingress publishing works like a charm!

sudo ethtool -K <interface> tx-checksum-ip-generic off

It’s worth trying!

Best regards, Ivan Spasov

ivan-spasov on Feb 3, 2023

I was trying to setup a Swarm over a Hetzner private network (using a vSwitch).

Mesh routing not working, I could only make it work in global / host mode. Tried anything with the firewall and the ethtool workarounds listed above, tried to change linux distro (Almalinux 8 and Debian 11), had zero luck.

Then I found this comment on Reddit which quite saved my life.

So, if you can’t get Swarm working over a Hetzner network and you already tried everything, check your MTUs: you need to adjust Docker networks MTU so that it’s lower or equal 1450, which is Hetzner vlan MTU.

frank-lar on Dec 29, 2022

We’ve run into this issue as well. The strange thing is we had two essentially identical environments, one has the issue, the other works fine.

These are the package versions we’re using, but it’s probably not that since one environment has this and it works.

# dpkg -l container* docker* linux-image* |grep ^ii
ii  containerd.io                        1.4.13-1                       amd64        An open and reliable container runtime
ii  docker-ce                            5:20.10.11~3-0~debian-bullseye amd64        Docker: the open-source application container engine
ii  docker-ce-cli                        5:20.10.11~3-0~debian-bullseye amd64        Docker CLI: the open-source application container engine
ii  linux-image-5.10.0-11-amd64          5.10.92-2                      amd64        Linux 5.10 for 64-bit PCs (signed)

The symptom is that the overlay network doesn’t work. The way to test this is with tcpdump:

# tcpdump -i ens224 -n port 4789 
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens224, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:46:04.028007 IP 10.xx.xx.1.34013 > 10.xx.xx.2.4789: VXLAN, flags [I] (0x08), vni 40xx
IP 10.xx.xx.6.35616 > 10.xx.xx.7.5555: Flags [S], seq 1429907708, win 64860, options [mss 1410,sackOK,TS val 974838813 ecr 0,nop,wscale 7], length 0

When it’s broken you only see packets going out, but no packets coming in. You need to have some containers running to trigger overlay traffic.

In our case we needed to do

ethtool -K ens224 tx-checksum-ip-generic off

on all swarm hosts. We added it to /etc/network/interfaces pre-up to fix this.

The only difference we’ve found is that one environment is VM version 13 (which works) and the other is VM version 14 (which doesn’t). We’ve found reference to a VMWare PR 2766401 which refers to a bug causing the vmxnet driver to drop packets. This is apparently fixed from VM version 15.

So our hypothesis is that if you running VM version 14 with the Debian 5.10.92-2 kernel it breaks, but running an older kernel version (in our case 4.19.98-1+deb10u1) or an older VM version, it works fine.

For reference, these versions worked everywhere for us (prior to upgrading)

ii  containerd.io                       1.4.4-1                      amd64        An open and reliable container runtime
ii  docker-ce                           5:19.03.3~3-0~debian-stretch amd64        Docker: the open-source application container engine
ii  docker-ce-cli                       5:19.03.3~3-0~debian-stretch amd64        Docker CLI: the open-source application container engine
ii  linux-image-4.19.0-8-amd64          4.19.98-1+deb10u1            amd64        Linux 4.19 for 64-bit PCs (signed)

For reference, we were running VMware ESXi, 7.0.3, 19193900 on both environments.

kleptog on Mar 18, 2022

Encountering this same issue, with the caveat that the tx-checksum-ip-generich off fix doesn’t seem to work for me

Gibletron on Nov 11, 2021

I can confirm this is exactly the solution for at least my case (centos 8.3 stream, docker 20.10.5)
ethtool -K ens192 tx-checksum-ip-generic off
after executing on all swarm machines, routing mesh now works! This seems to be reboot-safe

cheers @txtdevelop !

Edit/ps: it may be reboot-safe, but after recent dnf update the setting was lost again. For anyone needing it: (ETHTOOL_OPTS= seems not recognized in centos8-stream when using NM)
cat > /etc/NetworkManager/dispatcher.d/pre-up.d/10-tx-checksum-ip-generic <<'EOF'
ethtool -K ens192 tx-checksum-ip-generic off
EOF
chmod +x /etc/NetworkManager/dispatcher.d/pre-up.d/10-tx-checksum-ip-generic
makes it persistent

check:
ethtool --show-offload ens192 | grep tx-checksum-ip-generic

Works for us on the Docker Swarm worker node with CentOS 8.3 and Docker 20.10.5 Thank you @sgohl

mainstreamtt on Mar 31, 2021

I use swarm and I had network connectivity issues right after migrating to docker 20.10.x. After struggling a bit, I was able to find out the problem and to fix it.

I use overlay networks for my swarm services and it’s very common that my services are defined in several networks. So it basically means that my service have several ips (one for each of its network).

In the example below, my nginx has one hostname (server1) but at least 2 ips (ip1 in network net1 and ip2 in network net2).

services:
  nginx:
    image: nginx:latest
    hostname: server1
    network:
      - net1
      - net2

Now, here comes the interesting part: docker 19.03.x and docker 20.10.x behave differently when it comes to resolve the ip of the host server1.

docker 19.03.x ALWAYS returns the same ip (which can be either ip1 or ip2 in my above example)

whereas docker 20.10.x returns ALTERNATIVELY ip1 and ip2 (round-robin).

Now my problem was that I was using hostnames in my services and then I was using GO primitives such as ResolveTCPAddr to get the ip and connect to other services and I used lib such as pollon (https://github.com/sorintlab/pollon/blob/248c68238c160c056cd51d0e782276cef5c64ce4/pollon.go#L130) to track ip changes and reinit connections each time an ip change was detected…

So, since docker 20.10 is now returning a different ip after each DNS request when services have several ips, I was endlessly losing connection…

After realizing this, I had to modify the code of my services to take into account this new behavior.

I don’t know if what I’m describing here could be related to this issue. I’m just giving my feedback of the issues I had during my docker migration in case it may help someone…

Nowheresly on Feb 10, 2021

I had automatic updates enabled on my manager nodes, and out of nowhere they started consistently failing overnight last week. I believe it’s because they automtaically updated to a 5+ kernel. I have tried disabling kernel updates, and will post my findings.

@regbo are your overlay networks configured with the encryption option?

I’ve failed to reproduce using unencrypted networks (and successfully reproduced using encrypted networks). I think it is safe to conclude that there is a correlation between encrypted networks and the Kernel update. I’ve attached a simplified stack.yaml file for reference.

version: '3.7'

networks:
  encrypted:
    attachable: true
    driver: overlay
    driver_opts:
      encrypted: ""
  unencrypted:
    attachable: true
    driver: overlay

services:
  encrypted:
    image: "nginxdemos/hello"
    deploy:
      mode: global
    networks:
      - encrypted
    stdin_open: true
    tty: true
  unencrypted:
    image: "nginxdemos/hello"
    deploy:
      mode: global
    networks:
      - unencrypted
    stdin_open: true
    tty: true

I’ve created a new issue (https://github.com/moby/moby/issues/43443) to prevent confounding the two potentially different issues here. If it is later determined to be the same issue, we can rejoin them then.

dejeroWilliamScott on Apr 1, 2022

We recently encountered a similar issue on Azure, i.e. containers could ping containers on other nodes, but other traffic would hang indefinitely (e.g. curl, mysql, etc.). Our issue was resolved by a different solution than those presented in this thread, so I’m posting it here for completeness/awareness.

We RCA’d our issue to an Ubuntu Kernel update, specifically;

# functional
Linux test-vm-4 5.4.0-1072-azure #75~18.04.1-Ubuntu SMP Wed Mar 2 14:41:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 
# breaks cross-node traffic
Linux test-vm-5 5.4.0-1073-azure #76~18.04.1-Ubuntu SMP Thu Mar 10 11:17:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Downgrading the Kernel to 5.4.0-1072 (shown removing the 1073 version) restored cross-node container connectivity,

 sudo apt remove \
  linux-azure-5.4-cloud-tools-5.4.0-1073 \
  linux-azure-5.4-headers-5.4.0-1073 \
  linux-azure-5.4-tools-5.4.0-1073 \
  linux-modules-5.4.0-1073-azure \
  linux-modules-extra-5.4.0-1073-azure
sudo reboot

We initially thought the connectivity loss was related to a Docker Swarm upgrade (specifically to 20.10). However, we later determined that it wasn’t the Docker upgrade at all – it was the reboot that we performed while doing it (which loaded the new Kernel on our test environments).

Edit – the new Kernel (1073) went live earlier this week (2022-03-22*)

dejeroWilliamScott on Mar 28, 2022

Seems to be a problem only with VMWARE virtual NIC when used with VMXNET3 driver.

See : https://mails.dpdk.org/archives/dev/2018-September/111646.html

ypereirareis on Sep 29, 2021

Hi everyone, same problem here.

>cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

>uname -a
Linux c04d03 5.10.0-8-amd64 #1 SMP Debian 5.10.46-5 (2021-09-23) x86_64 GNU/Linux

>docker version
Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:54:22 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:31 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I’m trying to reach a service on port 2002 exposed through an overlay network on my swarm cluster. But impossible when going through localhost, at the oposite it works if targeting a remote node.

failure

>telnet localhost 2002
Trying 127.0.0.1...

>sudo tcpdump -i docker_gwbridge -vvv
tcpdump: listening on docker_gwbridge, link-type EN10MB (Ethernet), snapshot length 262144 bytes
14:07:37.406988 IP (tos 0x0, ttl 64, id 19776, offset 0, flags [DF], proto TCP (6), length 60)
    172.18.0.1.60094 > 172.18.0.2.2002: Flags [S], cksum 0x5856 (incorrect -> 0xf1ca), seq 4288015020, win 65495, options [mss 65495,sackOK,TS val 569509196 ecr 0,nop,wscale 7], length 0
14:07:38.416134 IP (tos 0x0, ttl 64, id 19777, offset 0, flags [DF], proto TCP (6), length 60)
    172.18.0.1.60094 > 172.18.0.2.2002: Flags [S], cksum 0x5856 (incorrect -> 0xedd9), seq 4288015020, win 65495, options [mss 65495,sackOK,TS val 569510205 ecr 0,nop,wscale 7], length 0
14:07:40.432125 IP (tos 0x0, ttl 64, id 19778, offset 0, flags [DF], proto TCP (6), length 60)
    172.18.0.1.60094 > 172.18.0.2.2002: Flags [S], cksum 0x5856 (incorrect -> 0xe5f9), seq 4288015020, win 65495, options [mss 65495,sackOK,TS val 569512221 ecr 0,nop,wscale 7], length 0
14:07:42.480127 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.18.0.2 tell 172.18.0.1, length 28
14:07:42.480200 ARP, Ethernet (len 6), IPv4 (len 4), Reply 172.18.0.2 is-at 02:42:ac:12:00:02 (oui Unknown), length 28

The tx-checksum-ip-generic off trick DOES work, but I do not want to use it as it’s not normal to have to use it.

IMPORTANT

Everything is ok if I downgrade the kernel to version linux-image-4.19.0-17-amd64 4.19.194-3

Thank you for your work !!!

ypereirareis on Sep 29, 2021

Hello, I believed I faced this issue when I upgraded to 20.10.17 on Ubuntu 22 Jammy. The network was blocking all requests to containers. I was able to fix the issue by downgrading to 20.10.17 using apt-get install docker-ce=5:20.10.16~3-0~ubuntu-jammy docker-ce-cli=5:20.10.16~3-0~ubuntu-jammy containerd.io docker-compose-plugin

@RicardoViteriR Thank you, this is the only solution that helped me!

troman29 on Oct 12, 2022