moby: UDP traffic source IP is lost

Description of problem:

I have a container created with -p 1234:1234/udp. When I tcpdump on the host I can see UDP traffic coming from various sources:

# tcpdump -nn 'udp and dst port 1234'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
02:03:58.896506 IP 192.168.1.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 159
02:03:58.896747 IP 192.168.1.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 148
02:03:58.902976 IP 192.168.1.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 159
02:03:58.903310 IP 192.168.1.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 185
02:03:59.275255 IP 192.168.153.2.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 153
02:03:59.275341 IP 192.168.153.2.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 149
02:03:59.726084 IP 192.168.230.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 146
02:03:59.726174 IP 192.168.230.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 153
02:03:59.726296 IP 192.168.230.1.514 > 192.168.1.210.1234: SYSLOG local0.info, length: 205

But when I tcpdump in the container, the packets all have docker0’s source IP:

root@6dc50e49a89d:/# tcpdump -nn 'udp and dst port 1234'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
09:06:13.804038 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 152
09:06:13.804321 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 192
09:06:13.804520 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 150
09:06:14.207527 IP 172.17.42.1.37990 > 172.17.0.5.1234: UDP, length 154
09:06:14.864310 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 148
09:06:14.864566 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 150
09:06:14.864866 IP 172.17.42.1.59711 > 172.17.0.5.1234: UDP, length 147

docker version:

Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64

docker info:

Containers: 3
Images: 72
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 78
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-24-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 6
Total Memory: 23.55 GiB
Name: core01
ID: 2OJM:LL4R:PJE4:SOEZ:DX2E:EVA2:5TOS:NKIL:ZSTP:VAXH:3YG5:AYK6
WARNING: No swap limit support

uname -a:

Linux core01 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):

Bare metal ubuntu 14.04.3

How reproducible: reproducible on the machine above and another machine running 1.6.2.

Steps to Reproduce:

Run container with UDP port forwarded
Run tcpdump on host and in container to display traffic on the forwarded port
Send UDP traffic to the host

Actual Results: packets displayed with tcpdump have a different source address outside/inside the container

Expected Results: packets displayed with tcpdump have the same source address outside/inside the container

About this issue

Original URL
State: closed
Created 9 years ago
Reactions: 4
Comments: 16 (5 by maintainers)

Commits related to this issue

Clear conntrack entries for UDP ports Conntrack entries are created for UDP flows even if there's nowhere to route these packets (ie. no listening socket and no NAT rules to apply). Moreover, iptable... — committed to akerouanton/docker by akerouanton a year ago
Clear conntrack entries for published UDP ports Conntrack entries are created for UDP flows even if there's nowhere to route these packets (ie. no listening socket and no NAT rules to apply). Moreove... — committed to akerouanton/docker by akerouanton a year ago
Clear conntrack entries for published UDP ports Conntrack entries are created for UDP flows even if there's nowhere to route these packets (ie. no listening socket and no NAT rules to apply). Moreove... — committed to akerouanton/docker by akerouanton a year ago
Clear conntrack entries for published UDP ports Conntrack entries are created for UDP flows even if there's nowhere to route these packets (ie. no listening socket and no NAT rules to apply). Moreove... — committed to corhere/moby by akerouanton a year ago
Clear conntrack entries for published UDP ports Conntrack entries are created for UDP flows even if there's nowhere to route these packets (ie. no listening socket and no NAT rules to apply). Moreove... — committed to akerouanton/docker by akerouanton a year ago

Most upvoted comments

There are a number of bug entries to the same issue, but the summary of @oopschen is IMHO the most relevant that I have found. Since we have the same issue with a log server in our production environment I spent some more time to analyse it.

TL;DR: conntrack -D -p udp after the container has started fixes the issue

I think this is caused by a delay between starting the docker userland-proxy for that port and inserting the iptable rules for destination NAT. When network packets are forwarded to the user-land proxy while the iptable rules are not in place the netfilter connection tracking creates an entry for that connection in the conntrack table. Even after the iptable rules for destination NAT are created, the connection tracking causes the network packets to still be forwarded to the userland-proxy. Only when the connection tracking entries are flushed by conntrack -D -p udp the iptable rules for destination NAT are actually evaluated and netfilter creates new connection entries for the destination NAT.

Disabling the docker userland-proxy (dockerd --userland-proxy=false) does not help either, because then dockerd itself binds to the exposed port on the host (see #28589) and connection tracking might start to track the packets before the iptable rules are in place. In this case the packets are not even forwarded to the container until the connection tracking records have been flushed.

mbonato on Dec 6, 2018

Seem to hit on docker for windows too. Any thought on how to setup a workaround? I would guess the issue occurs somewhere in the virtual machine used on windows.

elupus on Apr 28, 2020

I think @mbonato is correct that it’s about the netfilter connection tracking, but I don’t think it has anything specifically to do with the userland proxy being up or not, or any delay between that and the DNAT rule being inserted. The conntrack record is created even if there’s nothing listening or responding on the port.

Instead the timing comparison should be between when the conntrack record is created, and when the DNAT rule is inserted. If the connection is created anytime before the rule is present, even well before userland proxy is up, a null binding (rule?) is created to record that no NAT is required. Once that’s in place, as long as it hasn’t expired, any subsequent packets will find and use that null rule by looking up the tuple of (src ip, src port, dst ip, dst port), without checking the nat table for the now-possibly-present DNAT rule.

Basically it boils down to – newly added NAT rules do not apply to existing connections, which is kinda expected except that conntrack for certain UDP protocols or implementations might behave a little differently from what we expect.

So the required conditions to run into this are:

UDP, and
protocol is one-way and does not expect responses or acknowledgements from receiver and hence doesn’t reconnect, OR if protocol expects that but the reconnection uses the same src port, and
subsequent packets arrive at most net.nefilter.nf_conntrack_udp_timeout apart, preventing the null connection tracking record from expiring (default 30 sec).

Syslog over UDP is a prime candidate for this.

Perhaps the ideal and more targeted fix would be to flush all conntrack entries that match the protocol, dst port, and possibly dst ip, once both the userland proxy/dockerd port bind and DNAT rule is in place, i.e. the equivalent of conntrack --delete --proto udp --orig-port-dst <port> [--orig-dst <ip>].

edit, related: https://github.com/moby/moby/issues/8795#issuecomment-386267804 (workaround to watch for started containers and call conntrack -D)

cflee on Jul 19, 2019

I have the same issue in my production environment. Centos 7.3 Docker 18.06-ce Docker Compose 1.22

My UDP program receive packet from the gateway actually from my external device with a real ip randomly. Sometime i restart my docker-compose, it will be fixed.

======= UPDATE The problem may appear on any version. Steps to reproduce:

Remove your udp receiving program and the bridge network
Start your udp sender
Create the bridge
Start your udp receiving program

The source ip of udp must be the bridge gateway.

May the bridge learn the destination MAC address is the bridge gateway not the veth pair. I haven’t test it on all docker versions and all kenerl.

======== UPDATE IT IS NOT A BUG, BUT NETWORK PROTOCOL MAKES IT WORK THIS WAY When our udp receiving program not start up in docker, the bridge network created by docker will see the redirect all traffic to bridge gateway since no ip is claimed to have the destination ip. As a result, the gateway learn that MAC address is the gateway, when later the program start up, it claim it’s MAC address for the ip. MOST IMPORTANT, the arp has the entry for the ip, it won’t use the MAC address program claimed. All traffic will go through the gateway to the final destination ip, and the source ip will be changed.

SOLUTION When restart our application, please cut off all the traffice until the service is on.

oopschen on Nov 4, 2018