moby: Unable to retrieve user's IP address in docker swarm mode

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 155
 Running: 65
 Paused: 0
 Stopped: 90
Images: 57
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 868
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay null bridge
Swarm: active
 NodeID: 0ddz27v59pwh2g5rr1k32d9bv
 Is Manager: true
 ClusterID: 32c5sn0lgxoq9gsl1er0aucsr
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 172.31.24.209
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-92-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: ip-172-31-24-209
ID: 4LDN:RTAI:5KG5:KHR2:RD4D:MV5P:DEXQ:G5RE:AZBQ:OPQJ:N4DK:WCQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: panj
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Steps to reproduce the issue:

run following service which publishes port 80

docker service create \
--name debugging-simple-server \
--publish 80:3000 \
panj/debugging-simple-server

Try connecting with http://<public-ip>/.

Describe the results you received: Neither ip nor header.x-forwarded-for is the correct user’s IP address.

Describe the results you expected: ip or header.x-forwarded-for should be user’s IP address. The expected result can be archieved using standalone docker container docker run -d -p 80:3000 panj/debugging-simple-server. You can see both of the results via following links, http://swarm.issue-25526.docker.takemetour.com:81/ http://container.issue-25526.docker.takemetour.com:82/

Additional information you deem important (e.g. issue happens only occasionally): This happens on both global mode and replicated mode.

I am not sure if I missed anything that should solve this issue easily.

In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.

About this issue

Original URL
State: open
Created 8 years ago
Reactions: 218
Comments: 353 (35 by maintainers)

Links to this issue

Docker Swarm service with host and overlay network

Commits related to this issue

workaround moby/moby#25526 — committed to wenzowski-docker/traefik by wenzowski 7 years ago
xsnippet-web: "publish" the exposed port to the host "Publish" the exposed port to the host effectively configuring a DNAT iptables rule. This is useful for us, becase we get real remote IPs in nginx... — committed to xsnippet/xsnippet-infra by malor 6 years ago
Revert back to GitHub for downloads atm This experiment has shown there are some important things to resolve first: * Client IP address is being lost. Looks like Moby issue 25526: https://... — committed to sqlitebrowser/sqlitebrowser by justinclift 6 years ago
由于获取不到真实ip，切换成host模式，参考https://github.com/moby/moby/issues/25526 — committed to jiladahe1997/docker-jiladahe1997 by jiladahe1997 4 years ago

Most upvoted comments

I’ve also run into the issue when trying to run logstash in swarm mode (for collecting syslog messages from various hosts). The logstash “host” field always appears as 10.255.0.x, instead of the actual IP of the connecting host. This makes it totally unusable, as you can’t tell which host the log messages are coming from. Is there some way we can avoid translating the source IP?

+56

darrellenns on Nov 1, 2016

I agree with @dack , given the ingress network is using IPVS, we should solve this issue using IPVS so that the source IP is preserved and presented to the service correctly and transparently.

The solution need to work at the IP level so that any service that are not based on HTTP can still work properly as well (Can’t rely on http headers…).

And I cant stress out how important this is, without it, there are many services that simply cant operate at all in swarm mode.

+54

tlvenn on Nov 3, 2016

people really should stop saying “Mode: host” = working, because that’s not using Ingress. That makes it impossible to have just one container with a service running on the swarm but still be able to access it via any host. You either have to make the service “Global” or you can only access it on the host it is running, which kinda defeats the purpose of Swarm.

TLDR: “Mode: Host” is a workaround, not a solution

+34

r3pek on Jun 16, 2018

We’ve now released v3.1.0 of https://github.com/newsnowlabs/docker-ingress-routing-daemon, which modifies docker’s ingress mesh routing to expose true client IPs to service containers:

implemented purely through routing and firewall rules; and so
without the need for running any additional application layers like traefik or other reverse proxies; and so
there’s no need to reconfigure your existing application.

As far as I know, the docker-ingress-routing-daemon is the most lightweight way to access client IPs from within containers launched by docker services.

Summary of features:

Support for replacing docker’s masquerading with routing on incoming traffic either for all published services, or only for specified services on specified TCP or UDP ports
Support for recent kernels (such as employed in Google Cloud images) that set rp_filter=1 (strict) inside service containers (though this can be disabled)
Automatic installation of kernel tweaks that improve IPVS performance in production (though this can be disabled)

Please check it out and raise any issues you find.

+27

struanb on Mar 7, 2021

To Docker,

Wake up! There is an obvious problem given how many people are involved in this issue (there are others with the same cause). All we’re getting are people who repeat over and over again that there is a workaround, even though it’s been explained quite a few times why that workaround is not a solution. The very word “workaround” indicates that it is a temporary thing that will be resolved later. It’s been over 3 years since the issue was created and for all that time the response is “there is a workaround”.

To all Swarm users,

Let’s be realistic. The sad truth is that no one, including Docker, truly cares about Swarm. Everyone moved to k8s and there are no “real” investments in Swarm. The project is on life-support waiting to die so do not expect this issue to be fixed. Be smart and move to k8s.

+26

vfarcic on Oct 24, 2019

@thaJeztah thanks for workaround 😃 If you are deploying your proxy with compose version 3 new publish syntax is not supported so we can patch deployed service using this command (replace nginx_proxy with service name)

docker service update nginx_proxy \
	--publish-rm 80 \
	--publish-add "mode=host,published=80,target=80" \
	--publish-rm 443 \
	--publish-add "mode=host,published=443,target=443"

+26

pi0 on Feb 19, 2017

Whether you call it a bug or a feature request, ingress mesh without source nat is (in my opinion) essential. There are many applications that break when the can’t see the true source IP. Sure, in the case of web servers you can reverse proxy using a host node and add client IP headers. However, this adds overhead and is probably not an option for non web-based applications. With an application that actually needs the real source IP on the packet to be correct, the only option is to not use ingress mesh. That throws out a large part of the benefit of using swarm in the first place.

+23

darrellenns on Jan 4, 2019

Just tried this again with with:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

and the following docker compose:

version: "3.3"

services:

  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    command:
      #- "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  whoami:
    image: "containous/whoami"
    container_name: "simple-service"
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.whoami.rule=HostRegexp(`{any:.*}`)"
        - "traefik.http.routers.whoami.entrypoints=web"
        - "traefik.http.services.whoami.loadbalancer.server.port=80"

whoami output was:

Hostname: 085c373eb06d
IP: 127.0.0.1
IP: 10.0.1.10
IP: 172.19.0.4
RemoteAddr: 10.0.1.11:51888
GET / HTTP/1.1
Host: testserver.nub.local
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.5
Dnt: 1
Upgrade-Insecure-Requests: 1
X-Forwarded-For: 10.0.0.2
X-Forwarded-Host: testserver.nub.local
X-Forwarded-Port: 80
X-Forwarded-Proto: http
X-Forwarded-Server: ad14e372f6e9
X-Real-Ip: 10.0.0.2

So no. it still doesnt work

you can use traefik by host mode to get real ip

ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host

+22

wh0am111 on Apr 21, 2020

@tkeeler33 seems to work for me;

$ docker network create -d overlay swarm-net

$ docker service create \
  --name web \
  --publish mode=host,published=80,target=80 \
  --network swarm-net \
  --mode=global \
  nginx:alpine

$ docker service create --name something --network swarm-net nginx:alpine

Test if web service is able to connect with something service on the same network;

docker exec -it web.xczrerg6yca1f8ruext0br2ow.kv8iqp0wdzj3bw7325j9lw8qe sh -c 'ping -c3 -w1 something'
PING something (10.0.0.4): 56 data bytes
64 bytes from 10.0.0.4: seq=0 ttl=64 time=0.251 ms

--- something ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.251/0.251/0.251 ms

+22

thaJeztah on Jan 26, 2017

+1 for a solution for this issue.

Without the ability to retrieve user’s IP prevents us from using monitoring solutions like Prometheus.

+21

vfarcic on Nov 2, 2016

@trajano is right, the Windows client was the problem, deployment with the Linux client worked.

But I don’t understand why you even need the host or bridge network? The following works just fine for me, i.e. I get real client IP addresses in nginx:

version: '3.4'
services:
  nginx:
    ports:
      - mode: host
        protocol: tcp
        published: 80
        target: 80

+19

Mobe91 on Apr 20, 2018

2020 and still not fixed, what a drag. seems like a very important feature

+18

Damidara16 on Jul 6, 2020

For anyone running nginx on digitalocean with docker swarm and trying to get the real $remote_addr instead of just 10.255.0.2 within your nginx logs; you can use the solution from @coltenkrauter. The catch is that you can only run one nginx container on the host with this solution, which should be ok for most people.

Just change your docker-compose.yml file:

INCORRECT

services:
  nginx:
    ports:
      - "80:80"
      - "443:443"

CORRECT

services:
  nginx:
    ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host

edit: now we’re all guaranteed to get the right answer

+18

alextaujenis on Apr 9, 2019

running into the same issue, Is this going to be addressed? seems like basic functionality that should be slated for a release.

+18

bitsofinfo on Nov 15, 2018

There is very little chance this is going to be fixed ever. AFAIK everyone considers k8s won the “race” and swarm is not needed, but I would say both can co-exist and be properly used depending on the necessities and skills of the team using these. RIP swarm 😃

+13

ni-ajardan on Jul 28, 2020

Why do people expect that other people will do the work for them?

I’d love to be the hero and take care of this, but the reality is I’m working on many other things and this has no effect on my day to day. Does this affect your day to day? We’d love some help getting this resolved!

I’ve also looked at this multiple times and it really doesn’t seem like there is a way to make this work with IPVS NAT, which is what the magical swarm routing is using.

I agree that k8s is much more flexible here. If it suits your needs better then use it. Complaining that it’s not fixed and then threatening to switch to k8s really has no place in our issue tracker and is just generally unhelpful.

+13

cpuguy83 on Oct 24, 2019

+1, this really is a showstopper. I would believe the majority of applications needs the real clients ip. Just think of a mailserver stack - you cannt afford to accept mails from arbitrary hosts.

+13

JoelLinn on Sep 10, 2018

OK, I’ve had a brief look through the code and I think I have a slightly better understanding of it now. It does indeed appear to be using IPVS as stated in the blog. SNAT is done via an iptables rule which set up in service_linux.go. If I understand correctly, the logic behind it would be something like this (assuming node A receives a client packet for the service running on node B):

Swarm node A receives the client packet. IPVS/iptables translates (src ip)->(node a ip) and (dst ip)->(node B ip)
The packet is forwarded to node B
Node B sends it’s reply to node A (as that’s what it sees as the src ip)
Node A translates the src and dst back to the original values and forwards the reply to the client

I think the reasoning behind the SNAT is that the reply must go through the same node that the original request came through (as that’s where the NAT/IPVS state is stored). As requests may come through any node, the SNAT is used so that the service node knows which node to route the request back through. In an IPVS setup with a single load balancing node, that wouldn’t be an issue.

So, the question is then how to avoid the SNAT while still allowing all nodes handle incoming client requests. I’m not totally sure what the best approach is. Maybe there’s a way to have a state table on the service node so that it can use policy routing to direct replies instead of relying on SNAT. Or maybe some kind of encapsulation could help (VXLAN?). Or, the direct routing method of IPVS could be used. This would allow the service node to reply directly to the client (rather than via the node that received the original request) and would allow adding new floating IPs for services. However, it would also mean that the service can only be contacted via the floating IP and not the individual node IPs (not sure if that’s a problem for any use cases).

+13

darrellenns on Nov 6, 2016

@marech standalone container listens to port 80 and then proxies to localhost:8181

server {
  listen 80 default_server;
  location / {
    proxy_set_header        Host $host;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;
    proxy_pass          http://localhost:8181;
    proxy_read_timeout  90;
  }
}

If you have to do SSL termination, add another server block that listens to port 443, then do the SSL termination and proxies to localhost:8181 as well

Swarm mode’s nginx publishes 8181:80 and routes to another service based on request host.

server {
  listen 80;
  server_name your.domain.com;
  location / {
    proxy_pass          http://your-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

server {
  listen 80;
  server_name another.domain.com;
  location / {
    proxy_pass          http://another-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

+12

PanJ on Sep 20, 2016

seem like everyone leveling up from docker-compose to docker swarm encounters this issue, happy new year 2021 guys, I hope I won’t see it in 2022 🙈

+11

hinorashi on Jan 12, 2021

+11

RodrigoRVieira on May 10, 2018

Below is an improved version of the ingress routing daemon, ingress-routing-daemon-v2, which extends the policy routing rule model to allow each container to route its output packets back to the correct node, without the need for SNAT.

The improved model

In addition to inhibiting the SNAT rule as per the previous model, the new model requires an iptables rule in the ingress_sbox namespace on each node you intend to use as an IPVS load-balancer endpoint (so normally your manager nodes, or a subset of those manager nodes), that assigns a per-node TOS value to all packets destined for any node in the ingress network. (We use the final byte of the node’s ingress network IP.)

As the TOS value is stored within the packet, it can be read by the destination node to which the incoming request has been directed, and the packet has been sent.

Then in the container on the destination node, we arrange to map the TOS value on any incoming packets to a connection mark, using the same value.

Now, since outgoing packets on the same connection will have the same connection mark, we map the connection mark on any outgoing packets to a firewall mark, again using the same value.

Finally, a set of policy routing rules selects a different routing table, designed to route the outgoing packets back to the required load-balancer endpoint node, according to the firewall mark value.

Now, when client requests arrive at the published ports for any node in the swarm, the container (whether on the same and/or other nodes) to which the request is directed will see the original IP address of the client making the request, and be able to route the response back to the originating load-balancer node; which will, in turn, be able to route the response back to the client.

Usage

Setting up

Generate a value for INGRESS_NODE_GATEWAY_IPS specific to your swarm, by running ingress-routing-daemon-v2 as root on every one of your swarm’s nodes that you’d like to use as a load-balancer endpoint (normally only your manager nodes, or a subset of your manager nodes), noting the values shown for INGRESS_DEFAULT_GATEWAY. You only have to do this once, or whenever you add or remove nodes. Your INGRESS_NODE_GATEWAY_IPS should look like 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 (according to the subnet defined for the ingress network, and the number of nodes).

Running the daemon

Run INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2 --install as root on each and every one of your swarm’s nodes (managers and workers) before creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.

If you need to restrict the daemon’s activities to a particular service, then modify [ -n "$SERVICE" ] to [ "$SERVICE" = "myservice" ].

Uninstalling iptables rules

Run ingress-routing-daemon-v2 --uninstall on each node.

Testing

The ingress-routing-daemon-v2 script has been tested with 8 replicas of a web service deployed to a four-node swarm.

Curl requests for the service, directed to any of the specified load-balanced endpoint node IPs, returned successful responses, and examination of the container logs showed the application saw the incoming requests as originating from the Curl client’s IP.

Limitations

As the TOS value can store an 8-bit number, this model can in principle support up to 256 load-balancer endpoint nodes.

However as the model requires every container be installed with one iptables mangle rule + one policy routing rule + one policy routing table per manager endpoint node, there might possibly be some performance degradation as the number of such endpoint nodes increases (although experience suggests this is unlikely to be noticeable with <= 16 load-balancer endpoint nodes on modern hardware).

If you add load-balancer endpoints nodes to your swarm - or want to start using existing manager nodes as load-balancer endpoints - you will need to tread carefully as existing containers will not be able to route traffic back to the new endpoint nodes. Try restarting INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2 with the updated value for INGRESS_NODE_GATEWAY_IPS, then perform a rolling update of all containers, before using the new load-balancer endpoint.

Scope for native Docker integration

I’m not familiar with the Docker codebase, but I can’t see anything that ingress-routing-daemon-v2 does that couldn’t, in principle, be implemented by Docker natively, but I’ll leave that for the Docker team to consider, or as an exercise for someone familiar with the Docker code.

The ingress routing daemon v2 script

Here is the new ingress-routing-daemon-v2 script.

#!/bin/bash

# Ingress Routing Daemon v2
# Copyright © 2020 Struan Bartlett
# ----------------------------------------------------------------------
# Permission is hereby granted, free of charge, to any person 
# obtaining a copy of this software and associated documentation files 
# (the "Software"), to deal in the Software without restriction, 
# including without limitation the rights to use, copy, modify, merge, 
# publish, distribute, sublicense, and/or sell copies of the Software, 
# and to permit persons to whom the Software is furnished to do so, 
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be 
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS 
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
# SOFTWARE.
# ----------------------------------------------------------------------
# Workaround for https://github.com/moby/moby/issues/25526

if [ "$1" = "--install" ]; then
  INSTALL=1
elif [ "$1" = "--uninstall" ]; then
  INSTALL=0
else
  echo "Usage: $0 [--install|--uninstall]"
fi

echo
echo "  Dumping key variables..."

if [ "$INSTALL" = "1" ] && [ -z "$INGRESS_NODE_GATEWAY_IPS" ]; then
  echo "!!! ----------------------------------------------------------------------"
  echo "!!! WARNING: Using default INGRESS_NODE_GATEWAY_IPS"
  echo "!!! Please generate a list by noting the values shown"
  echo "!!! for INGRESS_DEFAULT_GATEWAY on each of your swarm nodes."
  echo "!!!"
  echo "!!! You only have to do this once, or whenever you add or remove nodes."
  echo "!!!"
  echo "!!! Then relaunch using:"
  echo "!!! INGRESS_NODE_GATEWAY_IPS=\"<Node Ingress IP List>\" $0 -x"
  echo "!!! ----------------------------------------------------------------------"
fi

read INGRESS_SUBNET INGRESS_DEFAULT_GATEWAY \
  < <(docker inspect ingress --format '{{(index .IPAM.Config 0).Subnet}} {{index (split (index .Containers "ingress-sbox").IPv4Address "/") 0}}')

echo "  - INGRESS_SUBNET=$INGRESS_SUBNET"
echo "  - INGRESS_DEFAULT_GATEWAY=$INGRESS_DEFAULT_GATEWAY"

# We need the final bytes of the IP addresses on the ingress network of every node
# i.e. We need the final byte of $INGRESS_DEFAULT_GATEWAY for every node in the swarm
# This shouldn't change except when nodes are added or removed from the swarm, so should be reasonably stable.
# You should configure this yourself, but for now let's assume we have 8 nodes with IPs in the INGRESS_SUBNET numbered x.x.x.2 ... x.x.x.9
if [ -z "$INGRESS_NODE_GATEWAY_IPS" ]; then
  INGRESS_NET=$(echo $INGRESS_DEFAULT_GATEWAY | cut -d'.' -f1,2,3)
  INGRESS_NODE_GATEWAY_IPS="$INGRESS_NET.2 $INGRESS_NET.3 $INGRESS_NET.4 $INGRESS_NET.5 $INGRESS_NET.6 $INGRESS_NET.7 $INGRESS_NET.8 $INGRESS_NET.9"
fi

echo "  - INGRESS_NODE_GATEWAY_IPS=\"$INGRESS_NODE_GATEWAY_IPS\""

# Create node ID from INGRESS_DEFAULT_GATEWAY final byte
NODE_ID=$(echo $INGRESS_DEFAULT_GATEWAY | cut -d'.' -f4)
echo "  - NODE_ID=$NODE_ID"

if [ -z "$INSTALL" ]; then
  echo
  echo "Ingress Routing Daemon v2 exiting."
  exit 0
fi

# Add a rule ahead of the ingress network SNAT rule, that will cause the SNAT rule to be skipped.
[ "$INSTALL" = "1" ] && echo "Adding ingress_sbox iptables nat rule: iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -D POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j ACCEPT; do true; done 2>/dev/null
[ "$INSTALL" = "1" ] && nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

# 1. Set TOS to NODE_ID in all outgoing packets to INGRESS_SUBNET
[ "$INSTALL" = "1" ] && echo "Adding ingress_sbox iptables mangle rule: iptables -t mangle -A POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -D POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff; do true; done 2>/dev/null
[ "$INSTALL" = "1" ] && nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -A POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff

if [ "$INSTALL" = "0" ]; then
  echo
  echo "Ingress Routing Daemon v2 iptables rules uninstalled, exiting."
  exit 0
fi

echo "Ingress Routing Daemon v2 starting ..."

# Watch for container start events, and configure policy routing rules on each container
# to ensure return path traffic for incoming connections is routed back via the correct interface
# and to the correct node from which the incoming connection was received.
docker events \
  --format '{{.ID}} {{index .Actor.Attributes "com.docker.swarm.service.name"}}' \
  --filter 'event=start' \
  --filter 'type=container' | \
  while read ID SERVICE
  do
    if [ -n "$SERVICE" ]; then
    
      NID=$(docker inspect -f '{{.State.Pid}}' $ID)
      echo "Container ID=$ID, NID=$NID, SERVICE=$SERVICE started: applying policy routes."
      
      # 3. Map any connection mark on outgoing traffic to a firewall mark on the individual packets.
      nsenter -n -t $NID iptables -t mangle -A OUTPUT -p tcp -j CONNMARK --restore-mark

      for NODE_IP in $INGRESS_NODE_GATEWAY_IPS
      do
        NODE_ID=$(echo $NODE_IP | cut -d'.' -f4)
	
	# 2. Map the TOS value on any incoming packets to a connection mark, using the same value.
        nsenter -n -t $NID iptables -t mangle -A PREROUTING -m tos --tos $NODE_ID/0xff -j CONNMARK --set-xmark $NODE_ID/0xffffffff
	
	# 4. Select the correct routing table to use, according to the firewall mark on the outgoing packet.
        nsenter -n -t $NID ip rule add from $INGRESS_SUBNET fwmark $NODE_ID lookup $NODE_ID prio 32700
	
	# 5. Route outgoing traffic to the correct node's ingress network IP, according to its firewall mark
	#    (which in turn came from its connection mark, its TOS value, and ultimately its IP).
        nsenter -n -t $NID ip route add table $NODE_ID default via $NODE_IP dev eth0
	
      done

    fi
  done

+10

struanb on Sep 7, 2020

After 3 years, no fix?

+10

vipcxj on Jul 11, 2019

I’m trying to get my team to build a PR which adds the proxy protocol to the ingress network. We are not Golang programmers, so we find it a bit tricky.

But I’m fervently hoping that the Docker team agrees that the best and most compatible (across the ecosystem) solution is to layer on proxy protocol support to the ingress network.

The complexity comes in the fact that the ingress network not only has to inject its own headers, but it has to support the fact that there might be upstream proxy protocol headers already inserted (for example Google LB or AWS ELB) .

On Sun, 17 Mar, 2019, 12:17 Daniele Cruciani, notifications@github.com wrote:

I forgot to say that of course your comment is welcomed (or I said it in a obscure way, sorry). But I like to reinforce the original @PanJ https://github.com/PanJ report:

In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.

I mean this “breaks the purpose of swarm mode”, of course only on this specific topic, is enought to deserve more attension.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-473621667, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsUwNWJsGKlLejcNzS2pR0awBB4OVlks5vXeTugaJpZM4Jf2WK .

+10

sandys on Mar 17, 2019

@darrellenns there is over 200 comments here, I think it would be better to lock and clean this issue providing the basic “just use host bind if it applies to you” solution while no official solution if provided, otherwise more people like me will miss that and just keep commenting the same stuff over and over

+10

rafaelsierra on Feb 15, 2019

I found an acceptable solution for my scenario:

services:
  server:
    image: httpd:2
    deploy:
      mode: global
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
    networks:
      - my_second_service
      - another_great_software

This will cause apache to listen on the host computer instead of behind the overlay network (reading the proper remote IP address), while still proxying requests to other services via the networks options and achieving “high availability” by having it running everywhere

+10

rafaelsierra on Feb 14, 2019

Hi Roberto I don’t think it is exaggerated - because host mode exposes single points of failure. Moreover, it expects additional layers of management for load balancing outside the swarm ecosystem.

By saying that you used azure lb yourself, you have kind of validated that argument.

It is tantamount to saying that “to run swarm with client ip propagation, make sure you are using an external load balancer that you setup…Or use one of the cloud services”.

We are not saying that it is not a temporary workaround…But it would be ignoring the promise of Swarm if we all do not categorically recognize the shortcoming.

On Thu, 5 Jul, 2018, 14:16 Roberto Fabrizi, notifications@github.com wrote:

@r3pek https://github.com/r3pek While I agree with you that you lose Ingress if you use Host mode to solve this predicament, I’d say that it hardly defeats the whole purpose of Swarm, which does so much more that a public facing ingress network. In our usage scenario we have in the same overlay swarm: management replicated containers that should only be accessed over the intranet -> they don’t need the caller’s ip, therefore they are configured “normally” and take advantage of the ingress. non-exposed containers -> nothing to say about these (I belive you are underestimating the power of being able to access them via their service name though). public facing container -> this is an nginx proxy that does https and url based routing. It was defined global even before the need to x-forward-for the client’s real ip, so no real issue there.

Having nginx global and not having ingress means that you can reach it via any ip of the cluster, but it’s not load balanced, so we added a very very cheap and easy to set up L4 Azure Load Balancer in front of the nginx service.

As you say, Host is a workaround, but saying that enabling it completely defeats the purpose of Docker Swarm is a little exagerated imo.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-402650066, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU_ogRzwM6X0PMknXxsxmZLLTtfraks5uDdJlgaJpZM4Jf2WK .

+10

sandys on Jul 5, 2018

It’s 2018. Anything newer about this issue?
In swarm mode, I can’t use nginx req limit. $remote_addr always caught 10.255.0.2. This is a really serious problem about docker swarm. Perhaps I should try kubernetes since today.

+10

maslow on Mar 16, 2018

@tlvenn as far as I know, Docker Swarm uses masquerading, since it’s the most straightforward way and guaranteed to work in most configurations. Plus this is the only mode that actually allows to masquerade ports too [re: @dack], which is handy. In theory, this issue could be solved by using IPIP encapsulation mode – the packet flow will be like this then:

A packet arrives at the gateway server – in our case any node of the swarm – and IPVS on that node determines that it is in fact a packet for a virtual service, based on its destination IP address and port.
Packet is encapsulated into another IP packet and sent over to the real server which was choosen based on the load balancing algorithm.
The real server receives the enclosing packet, decapsulates it and sees real client IP as source and virtual service IP as destination. All real servers are supposed to have a non-ARPable interface alias with the virtual service IP so that they would assume that this packet is actually destined for them.
The real server processes the packet and sends the response back to the client directly. The source IP in this case will be the virtual service IP, so no martian replies involved, which is good.

There’re, of course, many caveats and things-which-can-go-wrong, but generally this is possible and IPIP mode is widely used in production.

+10

kobolog on Nov 6, 2016

I believe I may have found a workaround for this issue, with the current limitation that service container replicas must all be deployed to a single node, for example with --constraint-add=‘node.hostname==mynode’, or with a set of swarms each consisting of a single node.

The problem

The underlying problem is caused by the SNAT rule in the iptables nat table in the ingress_sbox namespace, which causes all incoming requests to be seen by containers to have the node’s IP address in the ingress network (e.g. 10.0.0.2, 10.0.0.3, …, in the default ingress network configuration), e.g.:

iptables -t nat -A POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j SNAT --to-source 10.0.0.2

However, removing this SNAT rule means that while containers still receive incoming packets - now originating from the original source IP - outgoing packets sent back to the original source IP are sent via the container’s default gateway, which is not on the same ingress network but on the docker_gwbridge network (e.g. 172.31.0.1), and those packets are then lost.

The workaround

So the workaround comprises: 1. removing (in fact, inhibiting) this SNAT rule in the ingress_sbox namespace; and 2. creating a policy routing rule for swarm service containers, that forces those outgoing packets back to the node’s ingress network IP address it would have gone back to (e.g. 10.0.0.2); 3. automating the addition of the policy routing rules, so that every new service container has them promptly installed upon creation.

To inhibit the SNAT rule, we create a rule earlier in the table that prevents the usual SNAT being reached:

nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

(We do it this way, rather than just deleting the existing SNAT rule, as docker seems to recreate the SNAT rule several times during the course of creating a service. This approach just supersedes that rule, which makes it more resilient).

To create the container policy routing rule:

docker inspect -f '{{.State.Pid}}' <container-id>
nsenter -n -t $NID bash -c "ip route add table 1 default via 10.0.0.2 && ip rule add from 10.0.0.0/24 lookup 1 priority 32761"

Finally, putting the above together with docker event we automate the process of modifying the SNAT rules, and watching for newly started containers, and adding the policy routing rules, via this ingress-routing-daemon script:

#!/bin/bash

# Ingress Routing Daemon
# Copyright © 2020 Struan Bartlett
# --------------------------------------------------------------------
# Permission is hereby granted, free of charge, to any person 
# obtaining a copy of this software and associated documentation files 
# (the "Software"), to deal in the Software without restriction, 
# including without limitation the rights to use, copy, modify, merge, 
# publish, distribute, sublicense, and/or sell copies of the Software, 
# and to permit persons to whom the Software is furnished to do so, 
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be 
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS 
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
# SOFTWARE.
# --------------------------------------------------------------------
# Workaround for https://github.com/moby/moby/issues/25526

echo "Ingress Routing Daemon starting ..."

read INGRESS_SUBNET INGRESS_DEFAULT_GATEWAY \
  < <(docker inspect ingress --format '{{(index .IPAM.Config 0).Subnet}} {{index (split (index .Containers "ingress-sbox").IPv4Address "/") 0}}')

echo INGRESS_SUBNET=$INGRESS_SUBNET
echo INGRESS_DEFAULT_GATEWAY=$INGRESS_DEFAULT_GATEWAY

# Add a rule ahead of the ingress network SNAT rule, that will cause the SNAT rule to be skipped.
echo "Adding ingress_sbox iptables nat rule: iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -D POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j ACCEPT; do true; done 2>/dev/null
nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

# Watch for container start events, and configure policy routing rules on each container
# to ensure return path traffic from incoming connections is routed back via the correct interface.
docker events \
  --format '{{.ID}} {{index .Actor.Attributes "com.docker.swarm.service.name"}}' \
  --filter 'event=start' \
  --filter 'type=container' | \
  while read ID SERVICE
  do
    if [ -n "$SERVICE" ]; then
    
      NID=$(docker inspect -f '{{.State.Pid}}' $ID)
      echo "Container ID=$ID, NID=$NID, SERVICE=$SERVICE started: applying policy route."
      nsenter -n -t $NID bash -c "ip route add table 1 default via $INGRESS_DEFAULT_GATEWAY && ip rule add from $INGRESS_SUBNET lookup 1 priority 32761"
    fi
  done

Now, when requests arrive at the published ports for the single node, its containers will see the original IP address of the machine making the request.

Usage

Run the above ingress-routing-daemon as root on each and every one of your swarm nodes before creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.

Testing, use-cases and limitations

The above has been tested using multiple replicas constrained to a single node on a service running on a multi-node swarm.

It has also been tested using multiple nodes, each with a separate per-node service constrained to that node, but this comes with the limitation that different published ports must be used for each per-node service. Still that might work for some use-cases.

The method should also work using multiple nodes, if each were configured as a single node in its own swarm. This carries the limitation that the docker swarms can no longer be used to distribute containers across nodes, however there could still be other administration benefits of using docker services, such as container replica and lifecycle management.

Improving the workaround to address further use-cases

With further development, this method should be capable of scaling to multiple nodes without the need for separate per-node services or splitting the swarm. I can think of two possible approaches: 1. Arranging for Docker, or a bespoke daemon, to remove all non-local IPs from each node’s ipvsadm table. 2. Extending the policy routing rules to accommodate routing output packages back to the correct node.

For 1, we could poll ipvsadm -S -n to look for new IPs added to any service, check whether each is local, and remove any that aren’t. This would allow each node to function as a load balancer for its own containers within the overall service, but without requests reaching one node being able to be forwarded to another. This would certainly satisfy my own use-case, where we have our own IPVS load balancer sitting in front of a set of servers, each running a web application, which we would like to replace with several load-balanced containerised instances of the same application, to allow us to roll out updates without losing a whole server.

For 2, we could use iptables to assign a per-node TOS in each node’s ingress_sbox iptable (for example to the final byte of the node ingress network IP); then in the container, arrange to map the TOS value to a connection mark, and then from a connection mark to a firewall mark for outgoing packets, and for each firewall mark select a different routing table that routes the packets back to the originating node. The rules for this will be a bit clunky, but I imagine should scale fine to 2-16 nodes.

I hope the above comes in useful. I will also have a go at (2), and if I make progress will post a further update.

struanb on Sep 6, 2020

We just use haproxy to manage certs and offload ssl. People keep missing that the solution “running is host mode” is not a solution. They want it working with the ingress network to make advantage of the docker load balancing. The whole thread is basically a ‘use hostmode’ -> ‘not possible becauses “reasons”’ circle which is going for 3 years now.

Betriebsrat on Aug 12, 2019

TBH I’m not sure why the ingress network is not being patched to add ip data in proxy protocol.

It’s incremental, it won’t break existing stacks, it is a well defined standard, it’s widely supported by even the big cloud vendors, it’s widely supported by application frameworks.

Is it a significant Dev effort?

On Wed, 8 Aug, 2018, 21:30 Matt Glaser, notifications@github.com wrote:

@jamiejackson https://github.com/jamiejackson the “least bad” workaround we’ve found is using Traefik as a global service in host mode. They have a good generic example in their docs https://docs.traefik.io/user-guide/cluster-docker-consul/#full-docker-compose-file_1. We’ve seen some bugs that may or may not be related to this setup, but Traefik is a great project and it seems pretty stable on Swarm. There’s a whole thread on their issues page on it (that loops back here 😃 ), with similar workarounds: containous/traefik#1880 https://github.com/containous/traefik/issues/1880

Hope this helps. We also can’t use a solution that doesn’t allow us to check actual requester IPs so we’re stuck with this kludge fix until something changes. It seems like a pretty common need, for security reasons at least.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-411458326, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU7NNbsW44L95VYCvlyL_Bje-h6L9ks5uOwsUgaJpZM4Jf2WK .

sandys on Aug 8, 2018

@cpuguy83 i have been following some of the incoming proxy protocol features in k8s. E.g. https://github.com/kubernetes/kubernetes/issues/42616 (P.S. interestingly the proxy protocol here is flowing in from the Google Kubernetes Engine, which supports proxy protocol natively in HTTPS mode).

In addition, ELB has added support for Proxy Protocol v2 in Nov 2017 (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/doc-history.html)

Openstack Octavia LB-as-a-service (similar to our ingress) merged proxy protocol last April -http://git.openstack.org/cgit/openstack/octavia/commit/?id=bf7693dfd884329f7d1169eec33eb03d2ae81ace

Here’s some of the documentation around proxy protocol in openstack - https://docs.openshift.com/container-platform/3.5/install_config/router/proxy_protocol.html Some of the nuances are around proxy protocol for https (both in cases when you are terminating certificates at ingress or not).

sandys on Feb 26, 2018

Examing the flow, it seems to currently work like this (in this example, node A receives the incoming traffic and node B is running the service container):

node A performs DNAT to direct the packet into the ingress_sbox network namespace (/var/run/docker/netns/ingress_sbox)
ingress_sbox on node A runs IPVS in NAT mode, which performs DNAT to direct the packet to the container on node B (via the ingress overlay network) and also SNAT to change the source IP to the node A ingress overlay network IP
the packet is routed through the overlay to the real server
the return packets follow the same path in reverse, rewriting the source/dest addresses back to the original values

I think the SNAT could be avoided with something like this:

node A passes the packet into ingress_sbox without any NAT (iptables/policy routing ?)
node A ingress_sbox runs IPVS in direct routing mode, which sends packet to node B via ingress overlay network
container on node B receives the unaltered packet (container must accept packets for all public IPs, but not send ARP for them. there are several ways to do this, see IPVS docs).
the return packets send directly from node B to the client (does not need to go back through the overlay network or node A)

As an added bonus, no NAT state needs to be stored and overlay network traffic is reduced.

darrellenns on Nov 16, 2016

@Damidara16 that’s exactly what we don’t want to do. Is really insecure to do that. You can bypass it as you want.

sebastianfelipe on Jul 7, 2020

@BretFisher the mode: host is only a workaround but not the solution. As @sandys said that the workaround has few caveats, we should not consider this issue as fixed.

I’m not sure if there’s any improvement since the workaround has been discovered. I have moved to Kubernetes for quite a long time and still be surprised that the issue is still open for over two years.

PanJ on Jan 4, 2019

@r3pek While I agree with you that you lose Ingress if you use Host mode to solve this predicament, I’d say that it hardly defeats the whole purpose of Swarm, which does so much more that a public facing ingress network. In our usage scenario we have in the same overlay swarm: management replicated containers that should only be accessed over the intranet -> they don’t need the caller’s ip, therefore they are configured “normally” and take advantage of the ingress. non-exposed containers -> nothing to say about these (I belive you are underestimating the power of being able to access them via their service name though). public facing service -> this is an nginx proxy that does https and url based routing. It was defined global even before the need to x-forward-for the client’s real ip, so no real issue there.

Having nginx global and not having ingress means that you can reach it via any ip of the cluster, but it’s not load balanced or fault tolerant, so we added a very very cheap and easy to set up L4 Azure Load Balancer in front of the nginx service.

As you say, Host is a workaround, but saying that enabling it completely defeats the purpose of Docker Swarm is a little exagerated imo.

robertofabrizi on Jul 5, 2018

@sanimej Yes, it is the expected behavior that should be on swarm mode as well.

PanJ on Sep 19, 2016

@cpuguy83 this is has started becoming a blocker for our larger swarm setups. as we start leveraging more of the cloud (where proxy protocol is being used defacto by load balancers), we are losing this info which is very important to us.

Do you have any idea of an ETA ? this would help us a lot.

sandys on Mar 21, 2018

I agree. Swarm needs a high availability way to preserve source IP.

Probably using proxy protocol. I don’t think it’s a huge effort to add proxy protocol support to docker swarm.

Is anyone looking into this ?

On 28-Jan-2018 22:39, “Genki Takiuchi” notifications@github.com wrote:

Large drawback of that workaround is being not possible to avoid the down time during update. Currently, we have to choose to give up whether stability or source IP address.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-361078416, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU-or7fnhKTg7fhjtZYjGYHBFRE7Dks5tPKnYgaJpZM4Jf2WK .

sandys on Jan 28, 2018

@goetas mode=host worked for a while as a workaround, so I wouldn’t say problem is somehow solved. Using mode=host has lots of limitations, port is being exposed, can’t use swarm load balancing, etc.

darklow on Jan 26, 2018

So the kubernetes documentation is not complete. Another way which is being pretty commonly is actually ingress+proxy protocol.

https://www.haproxy.com/blog/haproxy/proxy-protocol/

Proxy protocol is a widely accepted protocol that preserves source information. Haproxy comes with built-in support for proxy protocol. Nginx can read but not inject proxy protocol.

Once the proxy protocol is setup, you can access that information from any downstream services like https://github.com/nginxinc/kubernetes-ingress/blob/master/examples/proxy-protocol/README.md

Even openshift leverages this for source IP information https://docs.openshift.org/latest/install_config/router/proxy_protocol.html

This is the latest haproxy ingress for k8s that injects proxy protocol.

IMHO the way to do this in swarm is to make the ingress able to read proxy protocol (in case it’s receiving traffic from an upstream LB that has already injected proxy protocol) as well as inject proxy protocol information (in case all the traffic actually hits the ingress first).

I am not in favour of doing it any other way especially when there is a generally accepted standard to do this.

sandys on Sep 9, 2017

@mrjana The whole idea of using IPVS (instead of whatever docker currently does in swarm mode) would be to avoid translating the source IP to begin with. Adding an X-Forwarded-For might help for some HTTP applications, but it’s of no use whatsoever for all the other applications that are broken by the current behaviour.

darrellenns on Nov 5, 2016

@mavenugo it’s koa’s request object which uses node’s remoteAddress from net module. The result should be the same for any other libraries that can retrieve remote address.

The expectation is that ip field should always be remote address regardless of any configuration.

PanJ on Aug 9, 2016

@zimbres If you can raise an issue at https://github.com/newsnowlabs/docker-ingress-routing-daemon/issues outlining your setup, DIND version, and so on, and I’ll be pleased to respond there.

N.B. v3.3.0 has just been released, which is necessary to upgrade to for UDP-based services like DNS.

struanb on Sep 5, 2021

It is interesting to know it, but see, this feature is available on kubernetes but not in docker swarm mode, and you are insisting there are options to run multiple instances of traefik, but in multiple nodes, if I want to run multiple instance in a single node, it is not possible, because this is not supported. Also, any other service, that does not just proxies requests, is not allowed to map any port, because it needs a special kind of configuration that need to map every host to it, and anyway it needs multiple node, at least one per instance.

And so on, and so on. You can scroll this discussion up and found other concerning about it. I do not think it could be reduced to a demostration of how good are you to produce workaround, because those remains workaround hard to maintain and hard to follow. And all the time spent to maintain special case workaround are better spent to fix the problem.

On other hand, if this kind of feature is a security problem for the model of docker swarm, just mark it as wontfix and I would plan to switch to kubernetes, if it is the case, I do not think there are conflict between projects, it is just saying explicitely it would never happen, and so everybody can take action, if possible before the choice of docker swarm mode for any kind of node swarm thing

danielecr on Mar 16, 2019

So, I believe that this bug affects traefiks ability to whitelist ips. Is that correct?

Anyway, for anybody looking to run swarm mode, this is an example with using host mode to publish ports.

docker service create \
--name traefik \
--constraint=node.role==manager \
--publish mode=host,target=80,published=80 \
--publish mode=host,target=443,published=443 \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
--mount type=bind,source=/home/$USER/dev-ops/logs,target=/dev-ops/logs \
--mount type=bind,source=/opt/data/traefik/traefik.toml,target=/traefik.toml \
--mount type=bind,source=/opt/data/traefik/acme.json,target=/acme.json \
--network traefik \
--label traefik.frontend.rule=Host:traefik.example.com \
--label traefik.port=8080 \
traefik \
--docker \
--docker.swarmMode \
--docker.watch \
--docker.exposedByDefault

coltenkrauter on Mar 6, 2019

The host mode workaround has been discussed multiple times on this issue already. While it may be OK for some limited scenarios (such as certain reverse proxy web traffic setups), it is not a general solution to this problem. Please read the previous posts rather than re-hashing the same “solutions” over again.

darrellenns on Feb 14, 2019

Hi guys Is there a workaround for now ? Without having it as a host port published port ?

On 11-Jan-2018 00:03, “Olivier Voortman” notifications@github.com wrote:

We have the same issue. I’d vote for a transparent solution within docker ingress that’d allow all applications (some using raw UDP/TCP, not especially HTTP) to work as expected.

I could use the “mode=host port publishing” workaround as my service is deployed globally. However, it seems that this is incompatible with the use of the macvlan network driver, which i need for some other reasons. We get logs like “macvlan driver does not support port mappings”. I tried using multiple networks, but it does not help.

I created a specific ticket here : docker/libnetwork#2050 https://github.com/docker/libnetwork/issues/2050 That leaves me no solution for now 😢

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-356693751, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsUzlM-BMbEsDYAiYH6hKLha-aRqerks5tJQJngaJpZM4Jf2WK .

sandys on Jan 10, 2018

This needs to be done at the docker swarm ingress level. If the ingress does not inject proxy protocol data, none of the downstream services (including traefix, nginx,etc) will be able to read it.

On 10-Sep-2017 21:42, “monotykamary” notifications@github.com wrote:

Traefik did add proxy_protocol support https://github.com/containous/traefik/pull/2004 a few weeks ago and is available from v1.4.0-rc1 onwards.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-328352805, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU3jj5dJcpMDysjIyGQK7SGx8GwWbks5shApqgaJpZM4Jf2WK .

sandys on Sep 10, 2017

I’d just like to chime in; while I do understand that there is no easy way to do this, not having the originating IP address preserved in some manner severely hampers a number of application use cases. Here’s a few I can think of off the top of my head:

Being able to have metrics detailing where your users originate from is vital for network/service engineering.
In many security applications you need to have access to the originating IP address in order to allow for dynamic blacklisting based upon service abuse.
Location awareness services often need to be able to access the IP address in order to locate the user’s general location when other methods fail.

From my reading of this issue thread, it does not seem that the given work-around(s) work very well when you want to have scalable services within a Docker Swarm. Limiting yourself to one instance per worker node greatly reduces the flexibility of the offering. Also, maintaining a hybrid approach of having an LB/Proxy on the edge running as a non-Swarm orchestrated container before feeding into Swarm orchestrated containers seems like going back in time. Why should the user need to maintain 2 different paradigms for service orchestration? What about being able to dynamically scale the LB/Proxy at the edge? That would have to be done manually, right?

Could the Docker team perhaps consider these comments and see if there is some way to introduce this functionality, while still maintaining the quality and flexibility present in the Docker ecosystem?

As a further aside, I’m currently getting hit by this now. I have a web application which forwards authorized/authenticated requests to a downstream web server. Our service technicians need to be able to verify whether people have reached the downstream server, which they like to use web access logs for. In the current scenario, there is no way for me to provide that functionality as my proxy server never sees the originating IP address. I want my application to be easily scalable, and it doesn’t seem like I can do this with the work-arounds presented, at least not without throwing new VMs around for each scaled instance.

Jitsusama on Aug 31, 2017

Just checking back in to see if there was no new developments in getting this real up thing figured out? It certainly is a huge limitation for us as well

virtuman on Jan 6, 2017

Just to advise, we are now running docker swarm, in conjunction with the docker ingress-routing-daemon (documented above), in production on www.newsnow.co.uk, currently handling some 1,000 requests per second.

We run the daemon on all 10 nodes of our swarm, of which currently only two serve as load balancers for incoming web traffic, which direct traffic to containers running on a selection of 4 of the remaining nodes (the other nodes currently being used for backend processes).

Using the daemon, we have been able to avoid significant changes to our tech stack (no need for cloudflare or nginx) or to our application’s internals (which relied upon identifying the requesting client’s IP address for geolocation and security purposes).

struanb on Jan 17, 2021

i think a workaround for this and to have a docker swarm run without setting host is to get the IP on the client-side. ex. using js for web and mobile clients and only accept from trusted sources. ex. js -> get ip, backend only accepts ips that include user-token or etc. ip can be set in the header and encrypted through https. however, i don’t know about performance

Damidara16 on Jul 7, 2020

We have had success using the PROXY protocol with DigitalOcean LB -> Traefik -> Apache container. The Apache container was able to log the real IPs of the users hitting the service. Theoretically should work as long as all the proxy layers support PROXY protocol.

https://docs.traefik.io/v1.7/configuration/entrypoints/#proxyprotocol

The Traefik service is on one Docker network named ‘ingress’, the Apache service has its own stack network but is also part of the ‘ingress’ network as external.

https://autoize.com/logging-client-ip-addresses-behind-a-proxy-with-docker/

autoize on May 23, 2020

try host-mode-networking

tangquan19950820 on Nov 1, 2019

deploy: mode: global ports:

target: 443 published: 443 protocol: tcp mode: host

Following this advice fixes the issue as docker swarm balancer is now out of the equation. For me it is a valid solution since it is still HA and I had already haproxy (inside docker flow proxy container). The only issue is that the haproxy stats are distribued among all the replicas so I need somehow to agregate that info when monitoring trafic for the whole cluster. In the past I just have one haproxy instance that was behind the docker swarm balancer. Cheers, Jacq

Jacq on Dec 10, 2018

hi guys, if you want Docker Swarm support in Cilium (especially for ingress and around this particular problem), please comment/like on this bug - https://github.com/cilium/cilium/issues/4159

On Fri, May 11, 2018 at 12:59 AM, McBacker notifications@github.com wrote:

1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-388159466, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU_18F_cNttRUaAwaRF3gVpMZ-3qSks5txJUfgaJpZM4Jf2WK .

sandys on May 24, 2018

@cpuguy83 hi, thanks for your reply. im aware there is no broad agreement on how you want to solve it. I’m kind of commenting on how the team has been occupied on stability issues and is not freed up for this issue. When would you think that this issue would be taken up (if at all) ?

sandys on Mar 21, 2018

This is VERY bad, it mitigates any rate limiting, fraud prevention, loging, secure logins, session monitoring etc.! Listening with mode:host works, but is no real solution as you lose mesh loadbalancing and only the software loadbalanacer on the host that has the public ip has to handle all the traffic alone.

MichaelErmer on Feb 21, 2018

@mostolog mode: host doesn’t expose your container to the host network. It removes the container from the ingress network, which is how Docker normally operates when running a container. Its would replicate the --publish 8080:8080 used in a docker run command. If nginx is getting real ips, it’s not a result of the socket being connected to those ips directly. To test this you should seriously consider using a raw TCP implementation or HTTP server, without a framework, and check the reported address.

CaptainYarb on Oct 13, 2017

To bad this is still an open issue , sadly … it doesn’t look like it’s going to be fixed soon

I think it will be closed by the bot soon. Since github launched this feature, many bugs can be ignored.

vipcxj on Jul 28, 2020

Please let us know when this issue has been fixed or not ?! should we use kuberneties instead ?

sadeghhp on Jan 28, 2019

Note you can solve this problem by running a global service and publishing ports using PublishMode=host. If you know which node people will be connecting on, you don’t even need that, just use a constraint to fix it to that node.

kleptog on Mar 28, 2018

@blazedd In our stack we have:

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host

and so, I would bet we get real IP’s on our logs.

mostolog on Oct 13, 2017

Is a solution on the roadmap for docker 1.14? We are delayed deployed our solutions using docker due in part to this issue.

bluejaguar on Jan 6, 2017

ya pretty weird @mavenugo …

Regarding the publish mode, I had already linked this from swarm kit above, this could be a workaround but I truly hope a proper solution comes with Docker 1.13 to address this issue for good.

This issue could very much be categorized as a bug because preserving the source ip is the behaviour we as users expect and it’s a very serious limitation of the docker services right now.

I believe both @kobolog and @dack have come up with some potential leads on how to solve this and it’s been almost 2 weeks with no follow up on those from Docker side.

Could we please have some visibility on who is looking into this issue at Docker and a status update ? Thanks in advance.

tlvenn on Nov 26, 2016

The easiest way would be to add the header for the original IP for every http request.

vingrad on Sep 5, 2019

Those are complex solutions - proxy protocol just adds additional header information and is a very well known standard - haproxy, nginx, AWS elb, etc all follow it. https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in ingress (where this support would be added). And all services will have it available.

On Fri, 4 Jan, 2019, 14:36 rubot <notifications@github.com wrote:

You could even extend dockerflow project and add an nginx variant to start kubernetes-ingressproxy for swarn. Definitely this all packed with swarm would raise additional system container as you know there are a bunch of them with kubernetes. Isn’t it the strength of swarm for slim resource projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 09:48:

I’m still kind of surprised, why people think this is a bug. From my perspective even the statement moving to kubernetes is not an adequate answer. As I see kubernetes has exact the same problem/behavior. You either have an external LB, or use something like nginx ingress proxy which must run as daemonset. Please correct me if I am wrong, but we have the same exact situation here, but no prepared autosolution here. Somebody could check and pack my proposed tcp stream solution described above to get something like nginx proxy behavior. Just accept, that swarm needs to be customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a workaround but not the solution. As @sandys https://github.com/sandys said that the workaround has few caveats, we should not consider this issue as fixed.

I’m not sure if there’s any improvement since the workaround has been discovered. I have moved to Kubernetes for quite a long time and still be surprised that the issue is still open for over two years.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-451382365, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-451389574, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK .

sandys on Jan 4, 2019

What has changed here ? Because we have been using host mode to do this for a long time now. In fact that is the workaround suggested in this thread as well.

The problem is that of course you have to lock this service to a particular host so Swarm can’t schedule it elsewhere. Which is what the issue was entirely - that proxy protocol/IPVS, etc solve this problem.

On Fri, 4 Jan, 2019, 09:34 Bret Fisher <notifications@github.com wrote:

When reading the OP’s request ( @PanJ https://github.com/PanJ ), it seems current features now solve this problem, as have been suggested for months. The OP didn’t ask for ingress routing + client IP AFAIK, they asked for a way to have a swarm service in replica/global obtain client IP’s, which is now doable. Two main areas of improvement allows this to happen:

We can now create a Swarm service that “publishes” a port to the host IP, skipping the ingress routing layer

That same service can attach to other networks like overlay at the same time, so it can access other services with overlay benifits

For me with 18.09 engine, I get the best of both worlds in testing. A single service can connect to backend overlay networks and also publish ports on the host NIC and see real client IP’s incoming on the host IP. I’m using that with traefik reverse proxy to log client IP traffic in traefik that is destined for backend services https://github.com/BretFisher/dogvscat/blob/7e9fe5b998f2cf86951df3f443714beb413d63fb/stack-proxy-global.yml#L75-L83. I feel like this could solve most requests I’ve seen for “logging the real IP”.

@PanJ https://github.com/PanJ does this solve it for you?

The key is to publish ports in mode: host rather than mode: ingress (the default).

The pro to this mode is you get real client IP’s and native host NIC performance (since it’s outside IPVS encapulation AFAIK). The con is it will only listen on the node(s) running the replicas.

To me, the request of “I want to use ingress IPVS routing and also see client IP” is a different feature request of libnetwork.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-451348906, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsUzs15UVWOVl54FLwBJSZJKX-9D0jks5u_tLPgaJpZM4Jf2WK .

sandys on Jan 4, 2019

When reading the OP’s request ( @PanJ ), it seems current features now solve this problem, as have been suggested for months. The OP didn’t ask for ingress routing + client IP AFAIK, they asked for a way to have a swarm service in replica/global obtain client IP’s, which is now doable. Two main areas of improvement allows this to happen:

We can now create a Swarm service that “publishes” a port to the host IP, skipping the ingress routing layer
That same service can attach to other networks like overlay at the same time, so it can access other services with overlay benifits

For me with 18.09 engine, I get the best of both worlds in testing. A single service can connect to backend overlay networks and also publish ports on the host NIC and see real client IP’s incoming on the host IP. I’m using that with traefik reverse proxy to log client IP traffic in traefik that is destined for backend services. I feel like this could solve most requests I’ve seen for “logging the real IP”.

@PanJ does this solve it for you?

The key is to publish ports in mode: host rather than mode: ingress (the default).

The pro to this mode is you get real client IP’s and native host NIC performance (since it’s outside IPVS encapulation AFAIK). The con is it will only listen on the node(s) running the replicas.

To me, the request of “I want to use ingress IPVS routing and also see client IP” is a different feature request of libnetwork.

BretFisher on Jan 4, 2019

to be fair - in k8s, it is possible to have a custom ingress. in swarm it is not.

swarm takes the stand that everything is “built-in”. Same is the case with networks - in k8s, you need to setup weave, etc… in swarm its built in.

so the point that andrey is making (and i kind of agree with ) is that - swarm should make this features as part of the ingress, since the user has no control over it.

On Sat, Jul 28, 2018 at 5:07 PM Seti notifications@github.com wrote:

As far as i know, the difference is that e en if you deploy such a loadbalancing service it will be ‘called’ from the swarmkit loadbalancer and so you loose the users ip. So you can not disable the swarmkit loadbalancer if not using hostmode.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-408601274, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU1-Ism_S1Awml8lO8N0Aq6rtrLH4ks5uLEzugaJpZM4Jf2WK .

sandys on Jul 28, 2018

@bluejaguar @ruudboon I am part of Docker. This is a well known issue. Right now the network team is focused on long standing bugs with overlay networking stability. This is why there haven’t really been new networking features in the last few releases.

My suggestion would be to come up with a concrete proposal that you are willing to work on to resolve the issue or at least a good enough proposal that anyone could take it and run with it.

cpuguy83 on Feb 23, 2018

Is anyone in the recent part of this thread here to represent the docker team and at least say that ‘we hear you’ ? Seems quite something that a feature you would expect to be ‘out of the box’ and of such interest to the community is still not resolved after being first reported August 9th 2016, some 18 months ago.

bluejaguar on Feb 23, 2018

@sandys https://github.com/sandys The proxy protocol looks like encapsulation (at least at connection initiation), which requires knowledge of the encapsulation from the receiver all the way down the stack. There are a lot of trade-offs to this approach.

That is true. That’s pretty much why it’s a standard with an RFC. There’s momentum behind this though - pretty much every component importance supports it. IMHO it’s not a bad decision to support it.

I wouldn’t want to support this in core, but perhaps making ingress pluggable would be a worthwhile approach.

This is a larger discussion - however i might add that the single biggest advantage of Docker Swarm over others is that it has all batteries built-in.

I would still request you to consider proxy protocol as a great solution to this problem which has industry support.

sandys on Feb 22, 2018

The problem seems partially solved in 17.12.0-ce by using mode=host.

docker service create --publish mode=host,target=80,published=80 --name=nginx nginx

It has some limitations (no routing mesh) but works!

goetas on Jan 26, 2018

nginx supports IP Transparency using the TPROXY kernel module.

@stevvooe Can Docker do something like that too?

teohhanhui on Apr 21, 2017

That’s how HaProxy is solving this issue: http://blog.haproxy.com/2012/06/05/preserve-source-ip-address-despite-reverse-proxies/

tlvenn on Nov 3, 2016

@jerrac - as also explained here: https://github.com/newsnowlabs/docker-ingress-routing-daemon/issues/24#issuecomment-1157077824 :-

To be clear, DIND exists to transform Docker’s ingress routing mesh to use policy routing instead of SNAT, to redirect client traffic to service nodes. It will only work to preserve the client IP if incoming requests directly reach a load-balancer node on a port published for a service via the ingress routing mesh. DIND is a network-layer tool (IPv4) and cannot inspect or modify HTTP headers.

I understand Traefik has often been used as a reverse proxy to work around the same limitation as DIND. In this model, incoming requests much directly reach the reverse proxy, which presumably must not be using the ingress routing mesh, but instead have its ports published using host mode, and be launched using --mode global. The Traefik reverse proxy will see the client IP of requests and can add these to the XFF header before reverse proxying them to an internal application service.

DIND therefore exists to solve a similar problem as a Traefik reverse proxy service placed in front of an internal application service, but without the need for the extra Traefik service (or for proxying, or for introduction/modification of XFF headers) and therefore without modification of the application service (if it doesn’t natively support XFF headers).

Combining DIND with Traefik should allow Traefik itself to be deployed using the ingress routing mesh, which could be useful if Traefik is providing additional benefits in one’s setup.

However, I’m not sure I can see a use-case for combining DIND with an internal application service published via the ingress routing mesh, and still fronted by a Traefik reverse proxy. Since the reverse proxy node is the client for the internal application service request, doing this will just expose the Docker network IP of that node, instead of the ingress network IP, to the internal application service.

Hope this makes sense.

struanb on Jun 15, 2022

IIRC it was necessary to use the “long form” of the ports definition, like so:

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
      - target: 8080
        published: 8080
        protocol: tcp

funkypenguin on Jun 15, 2022

any update?

liangyuanpeng on Feb 5, 2020

@kaysond Not a good place to ask.

Your are essentially asking two questions,

How IPVS works technically, and
Why libnetwork choose IPVS to start with

Both of them is hard to answer in different ways.

vicary on Nov 7, 2019

I think you misunderstood my question. I understand why services would want to see the true source ip. I want to know why Docker changes it before it gets to a container

On Nov 1, 2019, 1:47 AM, at 1:47 AM, Daniele Cruciani notifications@github.com wrote:

Maybe this is a naive question, but why is it necessary to rewrite the source ip to begin with? Wouldn’t the traffic be returned via the interface’s default gateway anyways? Even if it came via the swarm load balancer, the gateway could just return it via the load balancer which already knows where the traffic came from…

It is necessary to know from which IP is coming the request. Maybe a specific user want to limit the ip, and you can not do it outer of the service running, i.e. traefik do not know the content of the request that may specify which user is making it, so it can not exclude some user and accepts other based only on ip (because the policy in this example is ip + request-content => allow/disallow).

Or, more often, just for logging connection. I need to bill customer for my service usage, and I need to provide in tabular form: time of request, amount of resource, source IP of request. Almost every service billed provide this kind of report.

– You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/moby/moby/issues/25526#issuecomment-548711563

kaysond on Nov 1, 2019

What is your problem with the workaround ?

In our case, it’s the combination of this workaround with the inability to bind a host-exposed port to a specific IP address. Instead, all internal services that need the real visitor’s IP and support PROXY protocol, have their port exposed on 0.0.0.0 on the host which is less than optimal.

Another one is the non-negligible performance hit when you have hundreds of new connections per second - all the exposed ports are actually DNAT rules in iptables that require conntrack and have other problems (hits k8s too, but Swarm has this addidional level of NATs that make it worse).

peter-slovak on Oct 24, 2019

You can try to set another Nginx server outside the docker swarm cluster, and forward request to the swarm service. in this Niginx conf just add the forward headers. eg. location / { proxy_pass http://phpestate;

    #Proxy Settings
    proxy_redirect     off;
    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;

It seems there is no solution to get real client ip in the docker swarm mode.

lipingcom on Aug 30, 2019

People keep missing that the solution “running is host mode” is not a solution.

It’s not a solution by itself, but can be used (and is being used) very successfully as a workaround. You can still use Docker’s native load balancer - all you’re doing is adding a layer to the host network stack before you hit Docker’s service mesh.

matthanley on Aug 12, 2019

@ajardan that solution I have tried and is not viable for me as I more than a single host to respond on the frontend. Ideally I want the entire swarm to be able to route the requests. I agree that for small scale operations simply flipping one service to host mode and using it as an ingest server can work fine.

Placing something like traefik in host mode negates the benefits we are trying to take advantage of from using swarm though in most cases 😦

pattonwebz on Aug 12, 2019

@thaJeztah https://github.com/thaJeztah Can someone on the Docker Inc team update us on the status of this issue. Is it still being considered and/or worked on ? Any ETA ? Or is this completely ignored since Docker integration with Kubernetes ? It has been reported almost 3 years ago 😕

It would really be good to get this statement (“won’t fix”) so I can fully justify a migration to kubernetes. Such a shame.

Thanks.

ghenry on Aug 11, 2019

@thaJeztah Can someone on the Docker Inc team update us on the status of this issue. Is it still being considered and/or worked on ? Any ETA ? Or is this completely ignored since Docker integration with Kubernetes ? It has been reported almost 3 years ago 😕

jtraulle on Aug 11, 2019

I have filed a feature request for proxy protocol support to solve the issue in this bug.

Just in case anyone wants to add their comments.

https://github.com/moby/moby/issues/39465

On Wed, 10 Apr, 2019, 21:37 Daniele Cruciani, notifications@github.com wrote:

@port22 https://github.com/port22 I got your point, but docker manage its networks by itself, I tried to make it works with shorewall, but the only way is to create exceptions for docker rules/chains, and I had no success with docker swarm mode (but it is ok for docker in swarm mode, as far I disable all services but the ones running into the swarm) Maybe there should be options like there are for bridge network https://docs.docker.com/network/overlay/#customize-the-docker_gwbridge-interface so to make it simple to setup this, but still the main problem is the missing support in the overlay network. So options are not there, because those would be ignored, and dockerd will rewrite rules if modified from outside.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-481754635, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsUxsVQ7m9uiYbHhNKMMtkhTZV6iTNks5vfgwygaJpZM4Jf2WK .

sandys on Jul 9, 2019

As an alternative, can’t swarm just take the original source ip and create

a X-Forwarded-For Header? See #25526 (comment) https://github.com/moby/moby/issues/25526#issuecomment-367642600; X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT

the right solution here is proxy protocol injected at L4 . there are some relevant pro and con discussions in Envoy for the same usecase https://github.com/envoyproxy/envoy/issues/4128 and https://github.com/envoyproxy/envoy/issues/1031

On Wed, Apr 10, 2019 at 1:40 AM Sebastiaan van Stijn < notifications@github.com> wrote:

Nobody would use just a single host as a reverse-proxy. You want multiple hosts with a floating ip, and the swarm-mesh is mandatory to achieve this setup.

Each node in the swarm can run an instance of the reverse-proxy, and route traffic to the underlying services over an overlay network (but only the proxy would know about the original IP-address).

Make sure to read the whole thread (I see GitHub hides quite some useful comments, so you’ll have to expand those 😞);

As an alternative, can’t swarm just take the original source ip and create a X-Forwarded-For Header?

See #25526 (comment) https://github.com/moby/moby/issues/25526#issuecomment-367642600; X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-481415217, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU5KdnWQ21hJx_xzc-QROJiWbAlulks5vfPOigaJpZM4Jf2WK .

sandys on Apr 10, 2019

There are lots of features in kubernetes that are not in swarm, and vice versa. We all make decisions on which orchestrator to use for a specific solution based on many factors, including features. No one tool solves all problems/needs.

I’m just a community member trying to help. If you don’t like the current solutions for this problem, then it sounds like you should look at other ways to solve it, possibly with something like kubernetes. That’s a reasonable reason to choose one orchestrator over another if you think the kubernetes way of solving it is more to your liking.

Historically, the moby and swarm maintainers don’t close issues like this as wontfix because tomorrow someone from the community could drop a PR with a solution to this problem. Also, I think discussing the ways to work around it until then, are a valid use of this issue thread. 😃

While not a swarm maintainer, I can say that historically the team doesn’t disclose future feature plans beyond what PR’s you can currently see getting commits in the repos.

BretFisher on Mar 16, 2019

I’m still kind of surprised, why people think this is a bug. From my perspective even the statement moving to kubernetes is not an adequate answer. As I see kubernetes has exact the same problem/behavior. You either have an external LB, or use something like nginx ingress proxy which must run as daemonset. Please correct me if I am wrong, but we have the same exact situation here, but no prepared autosolution here. Somebody could check and pack my proposed tcp stream solution described above to get something like nginx proxy behavior. Just accept, that swarm needs to be customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a workaround but not the solution. As @sandys https://github.com/sandys said that the workaround has few caveats, we should not consider this issue as fixed.

I’m not sure if there’s any improvement since the workaround has been discovered. I have moved to Kubernetes for quite a long time and still be surprised that the issue is still open for over two years.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/25526#issuecomment-451382365, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK .

rubot on Jan 4, 2019

Well, Docker does not currently touch ingress traffic, so definitely at least not insignificant to add. Keep in mind also this is an open source project, if you really want something then it’s generally going to be up to you to implement it.

cpuguy83 on Aug 8, 2018

As far as i know, the difference is that even if you deploy such a loadbalancing service it will be ‘called’ from the swarmkit loadbalancer and so you loose the users ip. So you can not disable the swarmkit loadbalancer if not using hostmode.

setiseta on Jul 28, 2018

@Mobe91 Try to recreate the swarm. I also had an error. After the re-init swarm, everything worked for me. My docker-compose.yml file:

version: "3.6"

services:
    nginx:
        image: nginx:latest
        depends_on:
            - my-app
            - my-admin
        ports: 
            - target: 80
              published: 80
              protocol: tcp
              mode: host
            - target: 443
              published: 443
              protocol: tcp
              mode: host
            - target: 9080
              published: 9080
              protocol: tcp
              mode: host
        volumes:
            - /etc/letsencrypt:/etc/letsencrypt:ro
            - /home/project/data/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
            - /home/project/data/nginx/conf.d:/etc/nginx/conf.d
            - /home/project/public:/var/public
        networks:
            - my-network
            - bridge
        deploy:
            placement:
                constraints: [node.role == manager]

    my-app:
        image: my-app
        ports:
            - 8080:8080
        volumes:
            - /usr/src/app/node_modules
            - /home/project/public:/usr/src/app/public
        networks:
            - my-network

    my-admin:
        image: my-admin
        ports:
            - 9000:9000
        networks:
            - my-network

networks:
    my-network:
    bridge:
        external: true
        name: bridge

my docker version:

Client:
 Version:	18.03.0-ce
 API version:	1.37
 Go version:	go1.9.4
 Git commit:	0520e24
 Built:	Wed Mar 21 23:10:01 2018
 OS/Arch:	linux/amd64
 Experimental:	false
 Orchestrator:	swarm

Server:
 Engine:
  Version:	18.03.0-ce
  API version:	1.37 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	0520e24
  Built:	Wed Mar 21 23:08:31 2018
  OS/Arch:	linux/amd64
  Experimental:	false

Sorry for my English.

h-abdunabiyev on Apr 20, 2018

@kleptog Partially you can’t. It can’t avoid downtime while updating service.

genki on Apr 2, 2018

Seems like something everyone would want at some point, and since using overlay networks together with bridge/host networking is not really possible, this is a blocker in cases when you really need the client IP for various reasons.

Client: Version: 17.12.0-ce API version: 1.35 Go version: go1.9.2 Git commit: c97c6d6 Built: Wed Dec 27 20:03:51 2017 OS/Arch: darwin/amd64

Server: Engine: Version: 17.12.1-ce API version: 1.35 (minimum version 1.12) Go version: go1.9.4 Git commit: 7390fc6 Built: Tue Feb 27 22:17:54 2018 OS/Arch: linux/amd64 Experimental: true

ajardan on Mar 16, 2018

@sandys The proxy protocol looks like encapsulation (at least at connection initiation), which requires knowledge of the encapsulation from the receiver all the way down the stack. There are a lot of trade-offs to this approach.

I wouldn’t want to support this in core, but perhaps making ingress pluggable would be a worthwhile approach.

cpuguy83 on Feb 22, 2018

@cpuguy83 couldnt understand what you just meant.

Proxy protocol is layer 4. http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

The PROXY protocol’s goal is to fill the server’s internal structures with the information collected by the proxy that the server would have been able to get by itself if the client was connecting directly to the server instead of via a proxy. The information carried by the protocol are the ones the server would get using getsockname() and getpeername() :

address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)

socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)

layer 3 source and destination addresses

layer 4 source and destination ports if any

http://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.1-accept-proxy

accept-proxy

Enforces the use of the PROXY protocol over any connection accepted by any of the sockets declared on the same line. Versions 1 and 2 of the PROXY protocol are supported and correctly detected. The PROXY protocol dictates the layer 3/4 addresses of the incoming connection to be used everywhere an address is used, with the only exception of “tcp-request connection” rules which will only see the real connection address. Logs will reflect the addresses indicated in the protocol, unless it is violated, in which case the real address will still be used. This keyword combined with support from external components can be used as an efficient and reliable alternative to the X-Forwarded-For mechanism which is not always reliable and not even always usable. See also “tcp-request connection expect-proxy” for a finer-grained setting of which client is allowed to use the protocol.

Did you mean there was a better way than proxy protocol ? that’s entirely possible and would love to know more in context of source ip preservation in docker swarm. However, Proxy Protocol is more widely supported by other tools (like nginx, etc) which will be downstream to swarm-ingress… as well as tools like AWS ELB which will upstream to swarm-ingress. That was my only $0.02

sandys on Feb 22, 2018

These are L7 protocols. Swarm ingress is L4. There is nothing being reinvented here, it’s all IPVS using DNAT.

cpuguy83 on Feb 22, 2018

A few concerns with proxy protocol:

Is it decoded by docker itself, or by the application? If we are relying on the application to implement proxy protocol, then this is not a general solution for all applications and only works for web servers or other application that implement proxy protocol. If docker unwraps the proxy protocol and translates the address, then it will also have to track the connection state and perform the inverse translation on outgoing packets. I’m not in favor of a web-specific solution (relying on proxy protocol in the application), as docker is useful for many non-web applications as well. This issue should be addressed for the general case of any TCP/UDP application - nothing else in docker is web-specific.

As with any other encapsulation method, there is also the concern of packet size/MTU issues. However, I think this is probably going to be a concern with just about any solution to this issue. The answer to that will likely be make sure your swarm network supports a large enough MTU to allow for the overhead. I would think most swarms are run on local networks, so that’s probably not a major issue.

@trajano - We know it works with host networking (which is likely what your compose solution is doing). However, that throws out all of the cluster networking advantages of swarm (such as load balancing).

darrellenns on Feb 21, 2018

this is a very critical and important bug for us and this is blocking our go-live with Swarm. We also believe proxy protocol is the right solution for this. Docker ingress must pass source ip on proxy protocol.

On twitter one of the solutions that has been proposed is to use Traefik as ingress managed outside of Swarm. This is highly suboptimal for us - and not an overhead that we would like to manage.

If the Swarm devs want to check out how to implement proxy protocol in Swarm-ingress, they should check out all the bugs being discussed in Traefik (e.g. https://github.com/containous/traefik/issues/2619)

clearsensespw on Feb 21, 2018

@sandys I agree. Proxy protocol would be great idea. @thaJeztah @aluzzardi @mrjana could this issue get some attention please? There haven’t been any response from team for a while. Thank you.

darklow on Feb 5, 2018

Hi.

For the sake of understanding and completeness, let me summarize and please correct me if I’m wrong:

The main issue is that containers aren’t receiving original src-IP but swarm VIP. I have replicated this issue with the following scenario:

create docker swarm
docker service create --name web --publish 80:80 nginx
access.log source IP is 10.255.0.7 instead of client's browser IP

It seems:

When services within swarm are using (default) mesh, swarm does NAT to ensure traffic from same origin is always sent to same host-running-service? Hence, it’s loosing the original src-IP and replacing it by swarm’s service VIP.

Seems @kobolog https://github.com/moby/moby/issues/25526#issuecomment-258660348 and @dack https://github.com/moby/moby/issues/25526#issuecomment-260813865 proposals were refuted by @sanimej https://github.com/moby/moby/issues/25526#issuecomment-280722179 https://github.com/moby/moby/issues/25526#issuecomment-281289906 but, TBH, his arguments aren’t fully clear to me yet, neither I understand why thread hasn’t been closed if this is definitively impossible. @stevvooe ?

@sanimej wouldn’t this work?:

Swarm receives message with src-IP=A and destination=“my-service-virtual-address”
Package is sent to a swarm node running that service, encapsulating the original msg.
Node forwards to task changing destination to container-running-that-service-IP Swarm and nodes could maintain tables to ensure traffic from same origin is forwarded to same node whenever possible.

Wouldn’t an option to enable “reverse proxy instead of NAT” for specific services solve all this issues satisfying everybody?

On the other hand, IIUC, the only option left is to use https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node, which -again IIUC- seems to be like not using mesh at all, hence I don’t see the benefits of using swarm mode (vs compose). In fact, it looks like pre-1.12 swarm, needing Consul and so.

Thanks for your help and patience. Regards

mostolog on Aug 25, 2017

Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn’t it? I don’t mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.

Load balancing is done at L3/4. Adding an http header is not possible.

A fix will involve removing the rewrite of the source address.

stevvooe on Jan 19, 2017

How did that happen ? unassign_bug

tlvenn on Nov 26, 2016

@aluzzardi @mrjana Any update on this please ? A little bit of feedback from Docker would be very much appreciated.

tlvenn on Nov 25, 2016

Re: traefik - don’t you also have to deploy it as global? Maybe for a single node it doesn’t matter, but for multiple I don’t believe the mesh network will route traffic.

kaysond on Jun 15, 2022

@struanb Awesome! Thanks again for the workaround. I’ll start further discussions on the repository when appropriate to avoid derailing this bug report.

(1: It seems I copied the wrong line, I meant to refer to this line where 10.0.0.0/24 is hardcoded)

while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -D POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j ACCEPT; do true; done 2>/dev/null

Vaults on Feb 6, 2021

@Vaults Thanks for testing the daemon and for your feedback.

I’m not sure the SNAT-deletion rule is different from the line in my version of the code. At least I can’t spot any difference!
You’ve identified a real issue here. I like your idea of detecting the interface name. I reckon your approach is going to be reasonably resilient. I’ll look at incorporating this into the next version of the daemon, along with logic that skips configuring the container if an interface on the ingress network cannot be found.

FYI We have now published our latest version 2.5.1 of the daemon at https://github.com/newsnowlabs/docker-ingress-routing-daemon. This version includes:

Improved logging
Better error handling when dockerd is stopped or restarted, and the script’s calls to docker fail
Sets sysctl variables and firewall rules on load-balancing nodes within the ingress network namespace that are required to obtain high performance (net.ipv4.vs.conn_reuse_mode=0, net.ipv4.vs.expire_nodest_conn=1 and net.ipv4.vs.expire_quiescent_template=1 and iptables -t raw -I PREROUTING -p tcp -j CT --notrack). These took us some days to work out and track down and I strongly recommend upgrading to use them if your load-balancer nodes might receive high traffic levels.

struanb on Feb 6, 2021

@struanb Thank you for the workaround.

For some reason it fully works (after the changes mentioned below) except for all the containers on one machine. I’ve checked all the iptables on the host, ingress node, containers and they all seem to be pretty identical. The connection times out nevertheless. Maybe the packet keeps getting rerouted forever? I may resume testing further, but does anyone have some ideas for debugging?

I’ve also made a few changes for my situation, maybe I didn’t fully understand what is going on in the script, but for me these were necessary:

When deleting the SNAT rules on the host, I changed it to use the ingress network instead of the wider scope to get it deleted

while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -D POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff; do true; done 2>/dev/null

I’ve had some containers that didn’t use eth0 as lead interface for the ingress gateways. They gave an ‘Error: Nexthop has invalid gateway.’, so I added a small implementation to fetch it for sanity (not production ready):

        # 5. Route outgoing traffic to the correct node's ingress network IP, according to its firewall mark
        #    (which in turn came from its connection mark, its TOS value, and ultimately its IP).
        
        # First, we get the interface that uses the gateway 
        CIF=$(nsenter -n -t $NID ip route list | grep $INGRESS_SUBNET | cut -d " " -f 3)

        if [ -z "$CIF" ]; then
                echo "No proper container interface found. Does this service have exposed ports? Printing nsenter: "
                nsenter -n -t $NID ip route list
        fi
        nsenter -n -t $NID ip route add table $NODE_ID default via $NODE_IP dev $CIF

Vaults on Feb 6, 2021

A long as people post about it and don’t work to fix it, we’ll see it. There is very little time currently going into swarm from anyone.

cpuguy83 on Jan 12, 2021

@beornf @sebastianfelipe Adding to the context, CloudFlare also adds X-Forwarded-For and is largely free.

I think this could work for a lot of us that need a way out to get the real IP. Cloudfare can be adjusted as proxy or just DNS only. It fits perfectly for no Digital Ocean customers. It is the cleaner workaround until now. But I agree with @beornf, we need a real solution, without depending on Digital Ocean or Cloudfare to get this done.

Thanks!

sebastianfelipe on Nov 19, 2020

@sebastianfelipe that’s a big claim after all these years. You sure you’re not using host mode or other workarounds in this thread?

vicary on Nov 19, 2020

I use a managed HAIP, but you could use something else in front of the swarm, a standalone nginx load balancer that points to the IPs of your swarm. https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/

In your swarm, the reverse proxy needs this:

server {
        listen 443 ssl proxy_protocol;
        location / {
        proxy_set_header   X-Real-IP $proxy_protocol_addr;  # this is the real IP address

If you are running a swarm, you will need a load balancer to round-robin the requests to your swarm (or sticky, etc).

So far, this architectural decision may seem like a “missing piece”, however, this adds flexibility by providing options and removing the need to disable inbuilt functionality to replace it for something more suitable to the application needs.

visualex on Jul 28, 2020

To bad this is still an open issue , sadly … it doesn’t look like it’s going to be fixed soon

aduzsardi on Jul 25, 2020

https://github.com/docker/libnetwork

BretFisher on Feb 24, 2020

Out of curiosity… can some dev point me to the code that manages swarm networking?

r3pek on Feb 24, 2020

I wonder where the best place ask these questions is because I am now very intrigued to read the history of those choices and how it all works so I can get some more context here.

@kaysond Not a good place to ask.

Your are essentially asking two questions,

How IPVS works technically, and

Why libnetwork choose IPVS to start with

Both of them is hard to answer in different ways.

pattonwebz on Nov 7, 2019

Maybe this is a naive question, but why is it necessary to rewrite the source ip to begin with? Wouldn’t the traffic be returned via the interface’s default gateway anyways? Even if it came via the swarm load balancer, the gateway could just return it via the load balancer which already knows where the traffic came from…

kaysond on Nov 1, 2019

The best way is to come up with a proposal, ie. what do you expect the architecture to look like after the work is done. What does it bring? What do we lose?

Already done here: #39465

try host-mode-networking

Please read the whole thread before commenting

conrallendale on Nov 1, 2019

@pattonwebz Host mode can be enabled for a service running multiple containers on multiple hosts, you can even do that with mode=global. Then traefik will run on all your swarm nodes and accept connections to specified ports, then route the requests internally to services that need to see these connections.

I used this setup with a service in global mode but limited to manager nodes, and it was working perfectly fine for tens of thousands of requests/s

I would be happy to elaborate if more details are required.

ajardan on Aug 12, 2019

I’m also having the same problem but with haproxy. Though it’s ok to have proxy servers in host mode and HA using keepalived, the only missing part would be load balancing that I think is not much of an issue for a simple web proxy. Unless complicated scripts are included or proxy and backend are not on the same physical machine and network traffic is too high for one NIC and…

HellScre4m on Jul 23, 2019

Nobody would use just a single host as a reverse-proxy. You want multiple hosts with a floating ip, and the swarm-mesh is mandatory to achieve this setup.

Each node in the swarm can run an instance of the reverse-proxy, and route traffic to the underlying services over an overlay network (but only the proxy would know about the original IP-address).

Make sure to read the whole thread (I see GitHub hides quite some useful comments, so you’ll have to expand those 😞);

As an alternative, can’t swarm just take the original source ip and create a X-Forwarded-For Header?

See https://github.com/moby/moby/issues/25526#issuecomment-367642600; X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT

thaJeztah on Apr 9, 2019

Would it be possible to past the whole nginx config for nginx_stream and nginx_proxy with their Swarm configs ? This is awesome if it works !

@sandys Something like this: https://gist.github.com/rubot/10c79ee0086a8a246eb43ab631f3581f

rubot on Sep 11, 2018

We switched to proxy_protocol nginx global stream instance host mode, which is forwarding to replicated application proxy_nginx. This works well enough for the moment.

service global nginx_stream

stream {
    resolver_timeout 5s;
    # 127.0.0.11 is docker swarms dns server
    resolver 127.0.0.11 valid=30s;
    # set does not work in stream module, using map here
    map '' $upstream_endpoint {
        default proxy_nginx:443;
    }

    server {
        listen 443;
        proxy_pass $upstream_endpoint;
        proxy_protocol on;
    }
}

service replicated nginx_proxy

server {
    listen 443 ssl http2 proxy_protocol;
    include /ssl.conf.include;

    ssl_certificate /etc/nginx/certs/main.crt;
    ssl_certificate_key /etc/nginx/certs/main.key;

    server_name example.org;

    auth_basic           "closed site";
    auth_basic_user_file /run/secrets/default.htpasswd;

    # resolver info in nginx.conf
    set $upstream_endpoint app;
    location / {
        # relevant proxy_set_header in nginx.conf
        proxy_pass http://$upstream_endpoint;
    }
}

rubot on Sep 11, 2018

@jamiejackson the “least bad” workaround we’ve found is using Traefik as a global service in host mode. They have a good generic example in their docs. We’ve seen some bugs that may or may not be related to this setup, but Traefik is a great project and it seems pretty stable on Swarm. There’s a whole thread on their issues page on it (that loops back here 😃 ), with similar workarounds: https://github.com/containous/traefik/issues/1880

Hope this helps. We also can’t use a solution that doesn’t allow us to check actual requester IPs so we’re stuck with this kludge fix until something changes. It seems like a pretty common need, for security reasons at least.

oppodeldoc on Aug 8, 2018

Not 100% sure on what you mean, but externally we use a DNS with an A record per cluster node. This provides cheap “balancing” without having an external moving part. When a client makes a request, they chose a random A record, and connect to 443 on one of the cluster nodes.

There, the reverse proxy that is running on that specific node and listening on 443 gets a native connection, including the actual client IP. That reverse proxy container then adds a header and forwards the connection to another internal container using the swarm overlay network (tasks.backend). Since it uses the tasks.backend target, it will also get a random A record for an internal service.

So in the strict sense, it is bypassing magic of the overlay network that redirects the connection. It instead kind of replicates this behavior with the reverse proxy and adds a header. The final effect is the same (in a loose sense) as the magic of the overlay network. It also does it in parallel to running the swarm, meaning I can run all my other services that do not require the client IP on the same cluster without doing anything else for those.

By no means a perfect solution but until a fix is made (if ever) it gets you by without external components or major docker configuration.

maximelb on Aug 8, 2018

@sandys sure, here is an excerpt from our docker-compose with the relevant containers.

This is the reverse proxy docker-compose entry:

reverseproxy:
    image: yourorg/repo-proxy:latest
    networks:
      - network_with_backend_service
    deploy:
      mode: global
    ports:
      - target: 443
        published: 443
        protocol: tcp
        mode: host

This is the backend service entry:

backendservice:
    image: yourorg/repo-backend:latest
    networks:
      - network_with_backend_service
    deploy:
      replicas: 2

The target of the reverseproxy (the backend side) would be tasks.backendservice (which has A records for every replica). You can skip the networks part if the backend service is on the default swarm overlay network.

The global bit says "deploy this container exactly-once on every Docker swarm node. The ports mode: host is the one saying “bind to the native NIC of the node”.

Hope it helps.

maximelb on Aug 8, 2018

@jamiejackson that’s where things will be a bit different. In our case we are running a server that hosts long-running SSL connections and a custom binary protocol underneath so HTTP proxies were not possible. So we created a simple TCP forwarder and used a “msgpack” header that we could unpack manually on the internal server.

I’m not super familiar with HTTP proxies but I suspect most of them would do the trick for you. 😕

maximelb on Aug 8, 2018

@adijes, and other user who are facing this issue. You can bind the containers to the bridge network (as mentioned by some one in this thread).

version: "3.4"

services:
  frontend:
    image: nginx
    deploy:
      placement:
        constraints:
          - node.hostname == "prod1"
    networks:
      - default
      - bridge
  # backed services...
  # ...

networks:
  bridge:
    external:
      name: bridge

Our frontend is bind to bridge and always stay in an exact host, whose IP is bind to our public domain. This enable it receive real user IP. And because it’s also bind to default network, it will be able to connect to backed services.

You can also scale the frontend, as long as you keep it live in that only host. This make the host is a Single Point of Failure, but (I think) it’s OK for small site.

Edited to add more information:

My nginx containers is behind https://github.com/jwilder/nginx-proxy, I also use https://github.com/JrCs/docker-letsencrypt-nginx-proxy-companion to enable SSL. The nginx-proxy is run via docker run command, not a docker swarm service. Perhaps, that’s why I got real IP from clients. The bridge network is required to allow my nginx containers communicate with nginx-proxy.

FWIW, I’m using:

Client:
 Version:      17.09.1-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:23:40 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:25:03 2017
 OS/Arch:      linux/amd64
 Experimental: false

Above setup also works on another setup, which is running:

Client:
 Version:      17.09.1-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:23:40 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:25:03 2017
 OS/Arch:      linux/amd64
 Experimental: false

letientai299 on Mar 12, 2018

@dack Backends must know the proxy protocol. I think it solves most cases and at least you can lay a thin passthrough-like proxy that process the protocol header in front of your backends inside containers. Because of the lack of information is deadly issue, I believe it is necessary to solve it as fast as possible in advance of other neat solution.

genki on Feb 22, 2018

It is really a pity that is not possible to get client’s IP. this makes not usable most of the docker swarm nice features.

On my setup the only way to get the client’s IP is to use network_mode:host and not use swarm at all.

using mode=host port publishing or a traditional docker run -p "80:80" ... did not work

Some solutions were suggested in https://github.com/moby/moby/issues/15086 but the only solution that worked for me was “host” networking…

goetas on Jan 23, 2018

I’m running up against this issue again.

My setup is as follows:

ipvs load balancer in DR mode (external to the docker swarm)
3 docker nodes, with destination IP added to all nodes and arp configured appropriately for IPVS DR routing

I would like to deploy a stack to the swarm and have it listen on port 80 on the virtual IP without mangling the addresses.

I can almost get there by doing this: ports: - target: 80 published: 80 protocol: tcp mode: host

The problem here is that it doesn’t allow you to specify which IP address to bind to - it just binds to all. This creates problems if you want to run more than a single service using that port. It needs to to bind only to the one IP. Using different ports isn’t an option with DR load balancing. It seems that the devs made the assumption that the same IP will never exist on multiple nodes, which is not the case when using a DR load balancer.

In addition, if you use the short syntax, it will ignore the bind IP and still bind to all addresses. The only way I’ve found to bind to a single IP is to run a non-clustered container (not a service or stack).

So now I’m back to having to use standalone containers and having to manage them myself instead of relying on service/stack features to do that.

darrellenns on Dec 1, 2017

@blazedd Have you tried it? I’m getting external ip addresses when following @mostolog’s example.

0xcaff on Dec 1, 2017

Why not use IPVS route network to container directly? bind all swarm node’s overlay interface’s ips as virtual ips, use ip rule from xxx table xxx to make multi-gateway, then swarm nodes can route client to container directly(DNAT), without any userspace network proxy daemon(dockerd)

caoli5288 on Oct 15, 2017

mirroring the comment above - can proxy protocol not be used ? All cloud load balancers and haproxy use this for source ip preservation.

Calico also has ipip mode - https://docs.projectcalico.org/v2.2/usage/configuration/ip-in-ip - which is one of the reasons why github uses it. https://githubengineering.com/kubernetes-at-github/

sandys on Aug 17, 2017

@tonysongtl that’s not related to this issue

thaJeztah on Jul 25, 2017

@tkeeler33 --opt encrypted should not affect host-port mapping. The only purpose of encrypted option is to encrypt the vxlan tunnel traffic between the nodes. From docs : “If you are planning on creating an overlay network with encryption (–opt encrypted), you will also need to ensure protocol 50 (ESP) traffic is allowed.” Can you pls check your configurations to make sure ESP is allowed ? Also, the --opt encrypted option is purely data-plane encryption. All the control-plane traffic (routing exchanges, Service Discovery distribution, etc…) are all encrypted by default even without the option.

mavenugo on Jan 26, 2017

How can I use a stack file (yml v3) to get the same behaviour as when I would use --publish mode=host,target=80,published=80 via docker service create?

@hamburml - keep an eye on https://github.com/docker/docker/issues/30447 its an open issue/feature.

tkeeler33 on Jan 25, 2017

Sorry for double post… How can I use a stack file (yml v3) to get the same behaviour as when I would use --publish mode=host,target=80,published=80 via docker service create?

I tried

...
services:
  proxy:
    image: vfarcic/docker-flow-proxy:1.166
    ports:
      - "80:80/host"
      - "443:443/host" 
...

but that’s not working (used same pattern as in https://docs.docker.com/docker-cloud/apps/stack-yaml-reference/#/ports)

hamburml on Jan 24, 2017

@mavenugo I updated to docker 1.13 today and used mode=host on my proxy service. Currently it works, Client IP is preserved, but I hope for a better solution 😃 Thanks for your work!

hamburml on Jan 19, 2017

Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn’t it? I don’t mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.

hamburml on Jan 18, 2017

Sure and yes a doc update to indicate this behavior and the workaround of using the publish mode=host will be useful for such use-cases that fails in LVS-NAT mode.

mavenugo on Nov 27, 2016

Fair enough I guess @mavenugo given we have an alternative now.

At the very least, can we amend the doc for 1.13 so it clearly state that when using docker services with the default ingress publishing mode, the source ip is not preserved and hint at using the host mode if this is a requirement for running the service ?

I think it will help people who are migrating to services to not being burnt by this unexpected behaviour.

tlvenn on Nov 27, 2016

@aluzzardi any update for us ?

tlvenn on Nov 15, 2016

@sanimej good idea could be add all IPs to X-Forwarded-For header if its possible then we can see all chain.

@PanJ hmm, and how your nignx standalone container communicate to swarm instance, via service name or ip? Maybe can share nginx config part where you pass it to swarm instance.

marech on Sep 20, 2016

@sanimej I kinda saw how it works when I dug into the issue. But the use case (ability to retrieve user’s IP) is quite common.

I have limited knowledge on how the fix should be implemented. Maybe a special type of network that does not alter source IP address?

Rancher is similar to Docker swarm mode and it seems to have expected behavior. Maybe it is a good place to start.

PanJ on Sep 19, 2016

@PanJ The way the published port of a container is accessed is different in swarm mode. In the swarm mode a service can be reached from any node in the cluster. To facilitate this we route through an ingress network. 10.255.0.x is the address of the ingress network interface on the host in the cluster from which you try to reach the published port.

sanimej on Sep 19, 2016