moby: Docker Swarm-managed IPVS configurations become corrupt over time

Description

We are running a test Docker Swarm in AWS. We have recently noticed that requests to service IPs get routed to wrong services. After removing and readding the service, the issue disappears for a while, but eventually reappears after services are migrated to newer versions of the tested software.

The cause of these request routing problems is that the IPVS configurations managed by Docker Swarm that are used to implement service IPs for Docker Services get corrupted. Example corrupt IPVS table and iptables configuration below.

In the example, all services except 10.0.0.253 should have two instanes, and 10.0.0.253 should have one instance. As you can see in the IPVs table, some container IPs get entered into multiple IPVS pools, and there are also some container IPs (e.g. 10.0.0.47 in pool 262) that aren’t even running anymore.

Example corrupt IPVS table:

# nsenter -t 2468 -n sh -c 'iptables-save | grep -e MARK && ipvsadm'
-A OUTPUT -d 10.0.0.52/32 -j MARK --set-xmark 0x105/0xffffffff
-A OUTPUT -d 10.0.0.253/32 -j MARK --set-xmark 0x109/0xffffffff
-A OUTPUT -d 10.0.0.21/32 -j MARK --set-xmark 0x107/0xffffffff
-A OUTPUT -d 10.0.0.2/32 -j MARK --set-xmark 0x108/0xffffffff
-A OUTPUT -d 10.0.0.48/32 -j MARK --set-xmark 0x106/0xffffffff
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  261 rr
  -> 10.0.0.38:0                  Masq    1      0          0
  -> 10.0.0.47:0                  Masq    1      0          0
  -> 10.0.0.49:0                  Masq    1      0          0
FWM  262 rr
  -> 10.0.0.15:0                  Masq    1      0          0
  -> 10.0.0.47:0                  Masq    1      0          0
  -> 10.0.0.54:0                  Masq    1      0          0
  -> 10.0.0.254:0                 Masq    1      0          0
FWM  263 rr
  -> 10.0.0.19:0                  Masq    1      0          0
FWM  264 rr
  -> 10.0.0.47:0                  Masq    1      0          0
  -> 10.0.0.49:0                  Masq    1      0          0
  -> 10.0.0.53:0                  Masq    1      0          0
FWM  265 rr
  -> 10.0.0.54:0                  Masq    1      0          0

Moreover, the IPVS tables weren’t consistent between Swarm worker nodes.

In the modified IPVS output below (FM labels replaced with matching IP octets, since the marker labels weren’t the same between worker nodes) it is evident that e.g. service IP 10.0.0.21 has different IPVS configuration on worker-1 and worker-3.

worker-3# nsenter -t 2359 -n sh -c 'iptables-save | grep -e MARK && ipvsadm'
-A OUTPUT -d 10.0.0.2/32 -j MARK --set-xmark 0x109/0xffffffff
-A OUTPUT -d 10.0.0.253/32 -j MARK --set-xmark 0x105/0xffffffff
-A OUTPUT -d 10.0.0.52/32 -j MARK --set-xmark 0x106/0xffffffff
-A OUTPUT -d 10.0.0.48/32 -j MARK --set-xmark 0x107/0xffffffff
-A OUTPUT -d 10.0.0.21/32 -j MARK --set-xmark 0x108/0xffffffff
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  .253 rr
-> 10.0.0.54:0                  Masq    1      0          0
FWM  .52 rr
-> 10.0.0.38:0                  Masq    1      0          0
-> 10.0.0.49:0                  Masq    1      0          0
FWM  .48 rr
-> 10.0.0.15:0                  Masq    1      0          0
-> 10.0.0.47:0                  Masq    1      0          0
-> 10.0.0.54:0                  Masq    1      0          0
-> 10.0.0.254:0                 Masq    1      0          0
FWM  .21 rr
-> 10.0.0.19:0                  Masq    1      0          0
-> 10.0.0.51:0                  Masq    1      0          0
FWM  .2 rr
-> 10.0.0.47:0                  Masq    1      0          0
-> 10.0.0.53:0                  Masq    1      0          0
worker-1#  nsenter -t 3106 -n sh -c 'iptables-save | grep -e MARK && ipvsadm'
-A OUTPUT -d 10.0.0.52/32 -j MARK --set-xmark 0x105/0xffffffff
-A OUTPUT -d 10.0.0.253/32 -j MARK --set-xmark 0x109/0xffffffff
-A OUTPUT -d 10.0.0.2/32 -j MARK --set-xmark 0x108/0xffffffff
-A OUTPUT -d 10.0.0.48/32 -j MARK --set-xmark 0x106/0xffffffff
-A OUTPUT -d 10.0.0.21/32 -j MARK --set-xmark 0x107/0xffffffff
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  .253 rr
-> 10.0.0.54:0                  Masq    1      0          0
FWM  .52 rr
-> 10.0.0.38:0                  Masq    1      0          0
-> 10.0.0.47:0                  Masq    1      0          0
-> 10.0.0.49:0                  Masq    1      0          0
FWM  .48 rr
-> 10.0.0.15:0                  Masq    1      0          0
-> 10.0.0.47:0                  Masq    1      0          0
-> 10.0.0.54:0                  Masq    1      0          0
-> 10.0.0.254:0                 Masq    1      0          0
FWM  .21 rr
-> 10.0.0.19:0                  Masq    1      0          0
FWM  .2 rr
-> 10.0.0.47:0                  Masq    1      0          0
-> 10.0.0.49:0                  Masq    1      0          0
-> 10.0.0.53:0                  Masq    1      0          0

Steps to reproduce the issue:

No clear reproduction steps have been identified. We are hoping for some advice on what kin of logging Docker daemon can produce to help debug this in our end.

Describe the results you received:

Expected IPVS configurations in containers to stay synchronized with service configurations and between Swarm cluster nodes.

Describe the results you expected:

IPVS configurations became corrupt over time.

Additional information you deem important (e.g. issue happens only occasionally):

Is there some logging we could enable in the Docker daemon on each Swarm node to get a better idea of what it thinks it is doing when it changes the IPVS configurations?

Output of docker version:

$ docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   d5236f0
 Built:        Fri Jan 20 05:47:10 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   d5236f0
 Built:        Fri Jan 20 05:47:10 2017
 OS/Arch:      linux/amd64

Output of docker info:

$ docker info
Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 1.12.6
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null overlay
Swarm: active
 NodeID: 17d6uvei1gu4cl956dcqbru2g
 Is Manager: true
 ClusterID: 3dg2kejti78ucxky28skve8cv
 Managers: 3
 Nodes: 6
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: REDACTED
Runtimes: runc
Default Runtime: runc
Security Options: seccomp selinux
Kernel Version: 4.8.17-coreos
Operating System: Container Linux by CoreOS 1298.1.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.862 GiB
Name: swarm-manager-3
ID: OMFM:USMQ:5UGI:HZYZ:WOBA:GH6S:FEOZ:R46Z:NWCQ:Y6ZB:HX5I:3L56
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS, CoreOS.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

@tazle thanks for the report and yes as you suggested, the IPVS rules must stay coherent & consistent across the nodes. Since the IPVS rules (both additions and removal) are distributed to the nodes via gossip protocol, the actual problem of inconsistency could arise either from the originating node or the recipient. Hence, it is useful to dig into the docker daemon logs and look for any obvious errors on all the nodes. Can you please share those logs ?

You an also enable Debug logs by using the -D flag while running dockerd. You can also enable the debug flag dynamically using https://docs.docker.com/engine/reference/commandline/dockerd/#/configuration-reloading

If you find it hard to reproduce this and since it happens over time, it will be useful to enable the debugs on all the daemons and create a service and monitor it (take a snapshot of iptables and IPVS rules in all the relevant namespaces). When the issue happens, pls pass on all the snapshotted data along with the daemon logs. That will help us narrow down to the root-cause.

mavenugo on Jan 26, 2017