moby: Unable to retrieve user's IP address in docker swarm mode
Output of docker version
:
Client:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:00:36 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:00:36 2016
OS/Arch: linux/amd64
Output of docker info
:
Containers: 155
Running: 65
Paused: 0
Stopped: 90
Images: 57
Server Version: 1.12.0
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 868
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host overlay null bridge
Swarm: active
NodeID: 0ddz27v59pwh2g5rr1k32d9bv
Is Manager: true
ClusterID: 32c5sn0lgxoq9gsl1er0aucsr
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot interval: 10000
Heartbeat tick: 1
Election tick: 3
Dispatcher:
Heartbeat period: 5 seconds
CA configuration:
Expiry duration: 3 months
Node Address: 172.31.24.209
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-92-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: ip-172-31-24-209
ID: 4LDN:RTAI:5KG5:KHR2:RD4D:MV5P:DEXQ:G5RE:AZBQ:OPQJ:N4DK:WCQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: panj
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.):
Steps to reproduce the issue:
- run following service which publishes port 80
docker service create \
--name debugging-simple-server \
--publish 80:3000 \
panj/debugging-simple-server
- Try connecting with
http://<public-ip>/
.
Describe the results you received:
Neither ip
nor header.x-forwarded-for
is the correct user’s IP address.
Describe the results you expected:
ip
or header.x-forwarded-for
should be user’s IP address. The expected result can be archieved using standalone docker container docker run -d -p 80:3000 panj/debugging-simple-server
. You can see both of the results via following links,
http://swarm.issue-25526.docker.takemetour.com:81/
http://container.issue-25526.docker.takemetour.com:82/
Additional information you deem important (e.g. issue happens only occasionally):
This happens on both global
mode and replicated
mode.
I am not sure if I missed anything that should solve this issue easily.
In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 218
- Comments: 353 (35 by maintainers)
Links to this issue
Commits related to this issue
- workaround moby/moby#25526 — committed to wenzowski-docker/traefik by wenzowski 7 years ago
- xsnippet-web: "publish" the exposed port to the host "Publish" the exposed port to the host effectively configuring a DNAT iptables rule. This is useful for us, becase we get real remote IPs in nginx... — committed to xsnippet/xsnippet-infra by malor 6 years ago
- Revert back to GitHub for downloads atm This experiment has shown there are some important things to resolve first: * Client IP address is being lost. Looks like Moby issue 25526: https://... — committed to sqlitebrowser/sqlitebrowser by justinclift 6 years ago
- 由于获取不到真实ip,切换成host模式,参考https://github.com/moby/moby/issues/25526 — committed to jiladahe1997/docker-jiladahe1997 by jiladahe1997 4 years ago
I’ve also run into the issue when trying to run logstash in swarm mode (for collecting syslog messages from various hosts). The logstash “host” field always appears as 10.255.0.x, instead of the actual IP of the connecting host. This makes it totally unusable, as you can’t tell which host the log messages are coming from. Is there some way we can avoid translating the source IP?
I agree with @dack , given the ingress network is using IPVS, we should solve this issue using IPVS so that the source IP is preserved and presented to the service correctly and transparently.
The solution need to work at the IP level so that any service that are not based on HTTP can still work properly as well (Can’t rely on http headers…).
And I cant stress out how important this is, without it, there are many services that simply cant operate at all in swarm mode.
people really should stop saying “Mode: host” = working, because that’s not using Ingress. That makes it impossible to have just one container with a service running on the swarm but still be able to access it via any host. You either have to make the service “Global” or you can only access it on the host it is running, which kinda defeats the purpose of Swarm.
TLDR: “Mode: Host” is a workaround, not a solution
We’ve now released v3.1.0 of https://github.com/newsnowlabs/docker-ingress-routing-daemon, which modifies docker’s ingress mesh routing to expose true client IPs to service containers:
As far as I know, the docker-ingress-routing-daemon is the most lightweight way to access client IPs from within containers launched by docker services.
Summary of features:
rp_filter=1
(strict) inside service containers (though this can be disabled)Please check it out and raise any issues you find.
To Docker,
Wake up! There is an obvious problem given how many people are involved in this issue (there are others with the same cause). All we’re getting are people who repeat over and over again that there is a workaround, even though it’s been explained quite a few times why that workaround is not a solution. The very word “workaround” indicates that it is a temporary thing that will be resolved later. It’s been over 3 years since the issue was created and for all that time the response is “there is a workaround”.
To all Swarm users,
Let’s be realistic. The sad truth is that no one, including Docker, truly cares about Swarm. Everyone moved to k8s and there are no “real” investments in Swarm. The project is on life-support waiting to die so do not expect this issue to be fixed. Be smart and move to k8s.
@thaJeztah thanks for workaround 😃 If you are deploying your proxy with compose version 3 new publish syntax is not supported so we can patch deployed service using this command (replace
nginx_proxy
with service name)Whether you call it a bug or a feature request, ingress mesh without source nat is (in my opinion) essential. There are many applications that break when the can’t see the true source IP. Sure, in the case of web servers you can reverse proxy using a host node and add client IP headers. However, this adds overhead and is probably not an option for non web-based applications. With an application that actually needs the real source IP on the packet to be correct, the only option is to not use ingress mesh. That throws out a large part of the benefit of using swarm in the first place.
you can use traefik by host mode to get real ip
@tkeeler33 seems to work for me;
Test if
web
service is able to connect withsomething
service on the same network;+1 for a solution for this issue.
Without the ability to retrieve user’s IP prevents us from using monitoring solutions like Prometheus.
@trajano is right, the Windows client was the problem, deployment with the Linux client worked.
But I don’t understand why you even need the
host
orbridge
network? The following works just fine for me, i.e. I get real client IP addresses in nginx:2020 and still not fixed, what a drag. seems like a very important feature
For anyone running nginx on digitalocean with docker swarm and trying to get the real
$remote_addr
instead of just10.255.0.2
within your nginx logs; you can use the solution from @coltenkrauter. The catch is that you can only run one nginx container on the host with this solution, which should be ok for most people.Just change your
docker-compose.yml
file:INCORRECT
CORRECT
edit: now we’re all guaranteed to get the right answer
running into the same issue, Is this going to be addressed? seems like basic functionality that should be slated for a release.
There is very little chance this is going to be fixed ever. AFAIK everyone considers k8s won the “race” and swarm is not needed, but I would say both can co-exist and be properly used depending on the necessities and skills of the team using these. RIP swarm 😃
Why do people expect that other people will do the work for them?
I’d love to be the hero and take care of this, but the reality is I’m working on many other things and this has no effect on my day to day. Does this affect your day to day? We’d love some help getting this resolved!
I’ve also looked at this multiple times and it really doesn’t seem like there is a way to make this work with IPVS NAT, which is what the magical swarm routing is using.
I agree that k8s is much more flexible here. If it suits your needs better then use it. Complaining that it’s not fixed and then threatening to switch to k8s really has no place in our issue tracker and is just generally unhelpful.
+1, this really is a showstopper. I would believe the majority of applications needs the real clients ip. Just think of a mailserver stack - you cannt afford to accept mails from arbitrary hosts.
OK, I’ve had a brief look through the code and I think I have a slightly better understanding of it now. It does indeed appear to be using IPVS as stated in the blog. SNAT is done via an iptables rule which set up in service_linux.go. If I understand correctly, the logic behind it would be something like this (assuming node A receives a client packet for the service running on node B):
I think the reasoning behind the SNAT is that the reply must go through the same node that the original request came through (as that’s where the NAT/IPVS state is stored). As requests may come through any node, the SNAT is used so that the service node knows which node to route the request back through. In an IPVS setup with a single load balancing node, that wouldn’t be an issue.
So, the question is then how to avoid the SNAT while still allowing all nodes handle incoming client requests. I’m not totally sure what the best approach is. Maybe there’s a way to have a state table on the service node so that it can use policy routing to direct replies instead of relying on SNAT. Or maybe some kind of encapsulation could help (VXLAN?). Or, the direct routing method of IPVS could be used. This would allow the service node to reply directly to the client (rather than via the node that received the original request) and would allow adding new floating IPs for services. However, it would also mean that the service can only be contacted via the floating IP and not the individual node IPs (not sure if that’s a problem for any use cases).
@marech standalone container listens to port
80
and then proxies tolocalhost:8181
If you have to do SSL termination, add another server block that listens to port
443
, then do the SSL termination and proxies tolocalhost:8181
as wellSwarm mode’s nginx publishes
8181:80
and routes to another service based on request host.seem like everyone leveling up from docker-compose to docker swarm encounters this issue, happy new year 2021 guys, I hope I won’t see it in 2022 🙈
Below is an improved version of the ingress routing daemon,
ingress-routing-daemon-v2
, which extends the policy routing rule model to allow each container to route its output packets back to the correct node, without the need for SNAT.The improved model
In addition to inhibiting the SNAT rule as per the previous model, the new model requires an iptables rule in the ingress_sbox namespace on each node you intend to use as an IPVS load-balancer endpoint (so normally your manager nodes, or a subset of those manager nodes), that assigns a per-node TOS value to all packets destined for any node in the ingress network. (We use the final byte of the node’s ingress network IP.)
As the TOS value is stored within the packet, it can be read by the destination node to which the incoming request has been directed, and the packet has been sent.
Then in the container on the destination node, we arrange to map the TOS value on any incoming packets to a connection mark, using the same value.
Now, since outgoing packets on the same connection will have the same connection mark, we map the connection mark on any outgoing packets to a firewall mark, again using the same value.
Finally, a set of policy routing rules selects a different routing table, designed to route the outgoing packets back to the required load-balancer endpoint node, according to the firewall mark value.
Now, when client requests arrive at the published ports for any node in the swarm, the container (whether on the same and/or other nodes) to which the request is directed will see the original IP address of the client making the request, and be able to route the response back to the originating load-balancer node; which will, in turn, be able to route the response back to the client.
Usage
Setting up
Generate a value for
INGRESS_NODE_GATEWAY_IPS
specific to your swarm, by runningingress-routing-daemon-v2
as root on every one of your swarm’s nodes that you’d like to use as a load-balancer endpoint (normally only your manager nodes, or a subset of your manager nodes), noting the values shown forINGRESS_DEFAULT_GATEWAY
. You only have to do this once, or whenever you add or remove nodes. YourINGRESS_NODE_GATEWAY_IPS
should look like10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5
(according to the subnet defined for the ingress network, and the number of nodes).Running the daemon
Run
INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2 --install
as root on each and every one of your swarm’s nodes (managers and workers) before creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.If you need to restrict the daemon’s activities to a particular service, then modify
[ -n "$SERVICE" ]
to[ "$SERVICE" = "myservice" ]
.Uninstalling iptables rules
Run
ingress-routing-daemon-v2 --uninstall
on each node.Testing
The
ingress-routing-daemon-v2
script has been tested with 8 replicas of a web service deployed to a four-node swarm.Curl requests for the service, directed to any of the specified load-balanced endpoint node IPs, returned successful responses, and examination of the container logs showed the application saw the incoming requests as originating from the Curl client’s IP.
Limitations
As the TOS value can store an 8-bit number, this model can in principle support up to 256 load-balancer endpoint nodes.
However as the model requires every container be installed with one iptables mangle rule + one policy routing rule + one policy routing table per manager endpoint node, there might possibly be some performance degradation as the number of such endpoint nodes increases (although experience suggests this is unlikely to be noticeable with <= 16 load-balancer endpoint nodes on modern hardware).
If you add load-balancer endpoints nodes to your swarm - or want to start using existing manager nodes as load-balancer endpoints - you will need to tread carefully as existing containers will not be able to route traffic back to the new endpoint nodes. Try restarting
INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2
with the updated value forINGRESS_NODE_GATEWAY_IPS
, then perform a rolling update of all containers, before using the new load-balancer endpoint.Scope for native Docker integration
I’m not familiar with the Docker codebase, but I can’t see anything that
ingress-routing-daemon-v2
does that couldn’t, in principle, be implemented by Docker natively, but I’ll leave that for the Docker team to consider, or as an exercise for someone familiar with the Docker code.The ingress routing daemon v2 script
Here is the new
ingress-routing-daemon-v2
script.After 3 years, no fix?
I’m trying to get my team to build a PR which adds the proxy protocol to the ingress network. We are not Golang programmers, so we find it a bit tricky.
But I’m fervently hoping that the Docker team agrees that the best and most compatible (across the ecosystem) solution is to layer on proxy protocol support to the ingress network.
The complexity comes in the fact that the ingress network not only has to inject its own headers, but it has to support the fact that there might be upstream proxy protocol headers already inserted (for example Google LB or AWS ELB) .
On Sun, 17 Mar, 2019, 12:17 Daniele Cruciani, notifications@github.com wrote:
@darrellenns there is over 200 comments here, I think it would be better to lock and clean this issue providing the basic “just use host bind if it applies to you” solution while no official solution if provided, otherwise more people like me will miss that and just keep commenting the same stuff over and over
I found an acceptable solution for my scenario:
This will cause apache to listen on the host computer instead of behind the overlay network (reading the proper remote IP address), while still proxying requests to other services via the
networks
options and achieving “high availability” by having it running everywhereHi Roberto I don’t think it is exaggerated - because host mode exposes single points of failure. Moreover, it expects additional layers of management for load balancing outside the swarm ecosystem.
By saying that you used azure lb yourself, you have kind of validated that argument.
It is tantamount to saying that “to run swarm with client ip propagation, make sure you are using an external load balancer that you setup…Or use one of the cloud services”.
We are not saying that it is not a temporary workaround…But it would be ignoring the promise of Swarm if we all do not categorically recognize the shortcoming.
On Thu, 5 Jul, 2018, 14:16 Roberto Fabrizi, notifications@github.com wrote:
It’s 2018. Anything newer about this issue?
In swarm mode, I can’t use nginx req limit. $remote_addr always caught 10.255.0.2. This is a really serious problem about docker swarm. Perhaps I should try kubernetes since today.
@tlvenn as far as I know, Docker Swarm uses masquerading, since it’s the most straightforward way and guaranteed to work in most configurations. Plus this is the only mode that actually allows to masquerade ports too [re: @dack], which is handy. In theory, this issue could be solved by using IPIP encapsulation mode – the packet flow will be like this then:
There’re, of course, many caveats and things-which-can-go-wrong, but generally this is possible and IPIP mode is widely used in production.
I believe I may have found a workaround for this issue, with the current limitation that service container replicas must all be deployed to a single node, for example with --constraint-add=‘node.hostname==mynode’, or with a set of swarms each consisting of a single node.
The problem
The underlying problem is caused by the SNAT rule in the iptables nat table in the ingress_sbox namespace, which causes all incoming requests to be seen by containers to have the node’s IP address in the ingress network (e.g. 10.0.0.2, 10.0.0.3, …, in the default ingress network configuration), e.g.:
However, removing this SNAT rule means that while containers still receive incoming packets - now originating from the original source IP - outgoing packets sent back to the original source IP are sent via the container’s default gateway, which is not on the same ingress network but on the docker_gwbridge network (e.g. 172.31.0.1), and those packets are then lost.
The workaround
So the workaround comprises: 1. removing (in fact, inhibiting) this SNAT rule in the ingress_sbox namespace; and 2. creating a policy routing rule for swarm service containers, that forces those outgoing packets back to the node’s ingress network IP address it would have gone back to (e.g. 10.0.0.2); 3. automating the addition of the policy routing rules, so that every new service container has them promptly installed upon creation.
(We do it this way, rather than just deleting the existing SNAT rule, as docker seems to recreate the SNAT rule several times during the course of creating a service. This approach just supersedes that rule, which makes it more resilient).
docker event
we automate the process of modifying the SNAT rules, and watching for newly started containers, and adding the policy routing rules, via thisingress-routing-daemon
script:Now, when requests arrive at the published ports for the single node, its containers will see the original IP address of the machine making the request.
Usage
Run the above
ingress-routing-daemon
as root on each and every one of your swarm nodes before creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.Testing, use-cases and limitations
The above has been tested using multiple replicas constrained to a single node on a service running on a multi-node swarm.
It has also been tested using multiple nodes, each with a separate per-node service constrained to that node, but this comes with the limitation that different published ports must be used for each per-node service. Still that might work for some use-cases.
The method should also work using multiple nodes, if each were configured as a single node in its own swarm. This carries the limitation that the docker swarms can no longer be used to distribute containers across nodes, however there could still be other administration benefits of using docker services, such as container replica and lifecycle management.
Improving the workaround to address further use-cases
With further development, this method should be capable of scaling to multiple nodes without the need for separate per-node services or splitting the swarm. I can think of two possible approaches: 1. Arranging for Docker, or a bespoke daemon, to remove all non-local IPs from each node’s ipvsadm table. 2. Extending the policy routing rules to accommodate routing output packages back to the correct node.
For 1, we could poll ipvsadm -S -n to look for new IPs added to any service, check whether each is local, and remove any that aren’t. This would allow each node to function as a load balancer for its own containers within the overall service, but without requests reaching one node being able to be forwarded to another. This would certainly satisfy my own use-case, where we have our own IPVS load balancer sitting in front of a set of servers, each running a web application, which we would like to replace with several load-balanced containerised instances of the same application, to allow us to roll out updates without losing a whole server.
For 2, we could use iptables to assign a per-node TOS in each node’s ingress_sbox iptable (for example to the final byte of the node ingress network IP); then in the container, arrange to map the TOS value to a connection mark, and then from a connection mark to a firewall mark for outgoing packets, and for each firewall mark select a different routing table that routes the packets back to the originating node. The rules for this will be a bit clunky, but I imagine should scale fine to 2-16 nodes.
I hope the above comes in useful. I will also have a go at (2), and if I make progress will post a further update.
We just use haproxy to manage certs and offload ssl. People keep missing that the solution “running is host mode” is not a solution. They want it working with the ingress network to make advantage of the docker load balancing. The whole thread is basically a ‘use hostmode’ -> ‘not possible becauses “reasons”’ circle which is going for 3 years now.
TBH I’m not sure why the ingress network is not being patched to add ip data in proxy protocol.
It’s incremental, it won’t break existing stacks, it is a well defined standard, it’s widely supported by even the big cloud vendors, it’s widely supported by application frameworks.
Is it a significant Dev effort?
On Wed, 8 Aug, 2018, 21:30 Matt Glaser, notifications@github.com wrote:
@cpuguy83 i have been following some of the incoming proxy protocol features in k8s. E.g. https://github.com/kubernetes/kubernetes/issues/42616 (P.S. interestingly the proxy protocol here is flowing in from the Google Kubernetes Engine, which supports proxy protocol natively in HTTPS mode).
In addition, ELB has added support for Proxy Protocol v2 in Nov 2017 (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/doc-history.html)
Openstack Octavia LB-as-a-service (similar to our ingress) merged proxy protocol last April -http://git.openstack.org/cgit/openstack/octavia/commit/?id=bf7693dfd884329f7d1169eec33eb03d2ae81ace
Here’s some of the documentation around proxy protocol in openstack - https://docs.openshift.com/container-platform/3.5/install_config/router/proxy_protocol.html Some of the nuances are around proxy protocol for https (both in cases when you are terminating certificates at ingress or not).
Examing the flow, it seems to currently work like this (in this example, node A receives the incoming traffic and node B is running the service container):
I think the SNAT could be avoided with something like this:
As an added bonus, no NAT state needs to be stored and overlay network traffic is reduced.
@Damidara16 that’s exactly what we don’t want to do. Is really insecure to do that. You can bypass it as you want.
@BretFisher the
mode: host
is only a workaround but not the solution. As @sandys said that the workaround has few caveats, we should not consider this issue as fixed.I’m not sure if there’s any improvement since the workaround has been discovered. I have moved to Kubernetes for quite a long time and still be surprised that the issue is still open for over two years.
@r3pek While I agree with you that you lose Ingress if you use Host mode to solve this predicament, I’d say that it hardly defeats the whole purpose of Swarm, which does so much more that a public facing ingress network. In our usage scenario we have in the same overlay swarm: management replicated containers that should only be accessed over the intranet -> they don’t need the caller’s ip, therefore they are configured “normally” and take advantage of the ingress. non-exposed containers -> nothing to say about these (I belive you are underestimating the power of being able to access them via their service name though). public facing service -> this is an nginx proxy that does https and url based routing. It was defined global even before the need to x-forward-for the client’s real ip, so no real issue there.
Having nginx global and not having ingress means that you can reach it via any ip of the cluster, but it’s not load balanced or fault tolerant, so we added a very very cheap and easy to set up L4 Azure Load Balancer in front of the nginx service.
As you say, Host is a workaround, but saying that enabling it completely defeats the purpose of Docker Swarm is a little exagerated imo.
@sanimej Yes, it is the expected behavior that should be on swarm mode as well.
@cpuguy83 this is has started becoming a blocker for our larger swarm setups. as we start leveraging more of the cloud (where proxy protocol is being used defacto by load balancers), we are losing this info which is very important to us.
Do you have any idea of an ETA ? this would help us a lot.
I agree. Swarm needs a high availability way to preserve source IP.
Probably using proxy protocol. I don’t think it’s a huge effort to add proxy protocol support to docker swarm.
Is anyone looking into this ?
On 28-Jan-2018 22:39, “Genki Takiuchi” notifications@github.com wrote:
@goetas
mode=host
worked for a while as a workaround, so I wouldn’t say problem is somehow solved. Using mode=host has lots of limitations, port is being exposed, can’t use swarm load balancing, etc.So the kubernetes documentation is not complete. Another way which is being pretty commonly is actually ingress+proxy protocol.
https://www.haproxy.com/blog/haproxy/proxy-protocol/
Proxy protocol is a widely accepted protocol that preserves source information. Haproxy comes with built-in support for proxy protocol. Nginx can read but not inject proxy protocol.
Once the proxy protocol is setup, you can access that information from any downstream services like https://github.com/nginxinc/kubernetes-ingress/blob/master/examples/proxy-protocol/README.md
Even openshift leverages this for source IP information https://docs.openshift.org/latest/install_config/router/proxy_protocol.html
This is the latest haproxy ingress for k8s that injects proxy protocol.
IMHO the way to do this in swarm is to make the ingress able to read proxy protocol (in case it’s receiving traffic from an upstream LB that has already injected proxy protocol) as well as inject proxy protocol information (in case all the traffic actually hits the ingress first).
I am not in favour of doing it any other way especially when there is a generally accepted standard to do this.
@mrjana The whole idea of using IPVS (instead of whatever docker currently does in swarm mode) would be to avoid translating the source IP to begin with. Adding an X-Forwarded-For might help for some HTTP applications, but it’s of no use whatsoever for all the other applications that are broken by the current behaviour.
@mavenugo it’s koa’s request object which uses node’s
remoteAddress
fromnet
module. The result should be the same for any other libraries that can retrieve remote address.The expectation is that
ip
field should always be remote address regardless of any configuration.@zimbres If you can raise an issue at https://github.com/newsnowlabs/docker-ingress-routing-daemon/issues outlining your setup, DIND version, and so on, and I’ll be pleased to respond there.
N.B. v3.3.0 has just been released, which is necessary to upgrade to for UDP-based services like DNS.
It is interesting to know it, but see, this feature is available on kubernetes but not in docker swarm mode, and you are insisting there are options to run multiple instances of traefik, but in multiple nodes, if I want to run multiple instance in a single node, it is not possible, because this is not supported. Also, any other service, that does not just proxies requests, is not allowed to map any port, because it needs a special kind of configuration that need to map every host to it, and anyway it needs multiple node, at least one per instance.
And so on, and so on. You can scroll this discussion up and found other concerning about it. I do not think it could be reduced to a demostration of how good are you to produce workaround, because those remains workaround hard to maintain and hard to follow. And all the time spent to maintain special case workaround are better spent to fix the problem.
On other hand, if this kind of feature is a security problem for the model of docker swarm, just mark it as wontfix and I would plan to switch to kubernetes, if it is the case, I do not think there are conflict between projects, it is just saying explicitely it would never happen, and so everybody can take action, if possible before the choice of docker swarm mode for any kind of node swarm thing
So, I believe that this bug affects traefiks ability to whitelist ips. Is that correct?
Anyway, for anybody looking to run swarm mode, this is an example with using host mode to publish ports.
The host mode workaround has been discussed multiple times on this issue already. While it may be OK for some limited scenarios (such as certain reverse proxy web traffic setups), it is not a general solution to this problem. Please read the previous posts rather than re-hashing the same “solutions” over again.
Hi guys Is there a workaround for now ? Without having it as a host port published port ?
On 11-Jan-2018 00:03, “Olivier Voortman” notifications@github.com wrote:
This needs to be done at the docker swarm ingress level. If the ingress does not inject proxy protocol data, none of the downstream services (including traefix, nginx,etc) will be able to read it.
On 10-Sep-2017 21:42, “monotykamary” notifications@github.com wrote:
I’d just like to chime in; while I do understand that there is no easy way to do this, not having the originating IP address preserved in some manner severely hampers a number of application use cases. Here’s a few I can think of off the top of my head:
Being able to have metrics detailing where your users originate from is vital for network/service engineering.
In many security applications you need to have access to the originating IP address in order to allow for dynamic blacklisting based upon service abuse.
Location awareness services often need to be able to access the IP address in order to locate the user’s general location when other methods fail.
From my reading of this issue thread, it does not seem that the given work-around(s) work very well when you want to have scalable services within a Docker Swarm. Limiting yourself to one instance per worker node greatly reduces the flexibility of the offering. Also, maintaining a hybrid approach of having an LB/Proxy on the edge running as a non-Swarm orchestrated container before feeding into Swarm orchestrated containers seems like going back in time. Why should the user need to maintain 2 different paradigms for service orchestration? What about being able to dynamically scale the LB/Proxy at the edge? That would have to be done manually, right?
Could the Docker team perhaps consider these comments and see if there is some way to introduce this functionality, while still maintaining the quality and flexibility present in the Docker ecosystem?
As a further aside, I’m currently getting hit by this now. I have a web application which forwards authorized/authenticated requests to a downstream web server. Our service technicians need to be able to verify whether people have reached the downstream server, which they like to use web access logs for. In the current scenario, there is no way for me to provide that functionality as my proxy server never sees the originating IP address. I want my application to be easily scalable, and it doesn’t seem like I can do this with the work-arounds presented, at least not without throwing new VMs around for each scaled instance.
Just checking back in to see if there was no new developments in getting this real up thing figured out? It certainly is a huge limitation for us as well
Just to advise, we are now running docker swarm, in conjunction with the docker ingress-routing-daemon (documented above), in production on www.newsnow.co.uk, currently handling some 1,000 requests per second.
We run the daemon on all 10 nodes of our swarm, of which currently only two serve as load balancers for incoming web traffic, which direct traffic to containers running on a selection of 4 of the remaining nodes (the other nodes currently being used for backend processes).
Using the daemon, we have been able to avoid significant changes to our tech stack (no need for cloudflare or nginx) or to our application’s internals (which relied upon identifying the requesting client’s IP address for geolocation and security purposes).
i think a workaround for this and to have a docker swarm run without setting host is to get the IP on the client-side. ex. using js for web and mobile clients and only accept from trusted sources. ex. js -> get ip, backend only accepts ips that include user-token or etc. ip can be set in the header and encrypted through https. however, i don’t know about performance
We have had success using the PROXY protocol with DigitalOcean LB -> Traefik -> Apache container. The Apache container was able to log the real IPs of the users hitting the service. Theoretically should work as long as all the proxy layers support PROXY protocol.
https://docs.traefik.io/v1.7/configuration/entrypoints/#proxyprotocol
The Traefik service is on one Docker network named ‘ingress’, the Apache service has its own stack network but is also part of the ‘ingress’ network as external.
https://autoize.com/logging-client-ip-addresses-behind-a-proxy-with-docker/
try host-mode-networking
Following this advice fixes the issue as docker swarm balancer is now out of the equation. For me it is a valid solution since it is still HA and I had already haproxy (inside docker flow proxy container). The only issue is that the haproxy stats are distribued among all the replicas so I need somehow to agregate that info when monitoring trafic for the whole cluster. In the past I just have one haproxy instance that was behind the docker swarm balancer. Cheers, Jacq
hi guys, if you want Docker Swarm support in Cilium (especially for ingress and around this particular problem), please comment/like on this bug - https://github.com/cilium/cilium/issues/4159
On Fri, May 11, 2018 at 12:59 AM, McBacker notifications@github.com wrote:
@cpuguy83 hi, thanks for your reply. im aware there is no broad agreement on how you want to solve it. I’m kind of commenting on how the team has been occupied on stability issues and is not freed up for this issue. When would you think that this issue would be taken up (if at all) ?
This is VERY bad, it mitigates any rate limiting, fraud prevention, loging, secure logins, session monitoring etc.! Listening with mode:host works, but is no real solution as you lose mesh loadbalancing and only the software loadbalanacer on the host that has the public ip has to handle all the traffic alone.
@mostolog
mode: host
doesn’t expose your container to the host network. It removes the container from the ingress network, which is how Docker normally operates when running a container. Its would replicate the--publish 8080:8080
used in a docker run command. If nginx is getting real ips, it’s not a result of the socket being connected to those ips directly. To test this you should seriously consider using a raw TCP implementation or HTTP server, without a framework, and check the reported address.I think it will be closed by the bot soon. Since github launched this feature, many bugs can be ignored.
Please let us know when this issue has been fixed or not ?! should we use kuberneties instead ?
Note you can solve this problem by running a global service and publishing ports using PublishMode=host. If you know which node people will be connecting on, you don’t even need that, just use a constraint to fix it to that node.
@blazedd In our stack we have:
and so, I would bet we get real IP’s on our logs.
Is a solution on the roadmap for docker 1.14? We are delayed deployed our solutions using docker due in part to this issue.
ya pretty weird @mavenugo …
Regarding the publish mode, I had already linked this from swarm kit above, this could be a workaround but I truly hope a proper solution comes with Docker 1.13 to address this issue for good.
This issue could very much be categorized as a bug because preserving the source ip is the behaviour we as users expect and it’s a very serious limitation of the docker services right now.
I believe both @kobolog and @dack have come up with some potential leads on how to solve this and it’s been almost 2 weeks with no follow up on those from Docker side.
Could we please have some visibility on who is looking into this issue at Docker and a status update ? Thanks in advance.
The easiest way would be to add the header for the original IP for every http request.
Those are complex solutions - proxy protocol just adds additional header information and is a very well known standard - haproxy, nginx, AWS elb, etc all follow it. https://www.haproxy.com/blog/haproxy/proxy-protocol/
The surface area of the change would be limited to the Swarm built in ingress (where this support would be added). And all services will have it available.
On Fri, 4 Jan, 2019, 14:36 rubot <notifications@github.com wrote:
What has changed here ? Because we have been using host mode to do this for a long time now. In fact that is the workaround suggested in this thread as well.
The problem is that of course you have to lock this service to a particular host so Swarm can’t schedule it elsewhere. Which is what the issue was entirely - that proxy protocol/IPVS, etc solve this problem.
On Fri, 4 Jan, 2019, 09:34 Bret Fisher <notifications@github.com wrote:
When reading the OP’s request ( @PanJ ), it seems current features now solve this problem, as have been suggested for months. The OP didn’t ask for ingress routing + client IP AFAIK, they asked for a way to have a swarm service in replica/global obtain client IP’s, which is now doable. Two main areas of improvement allows this to happen:
For me with 18.09 engine, I get the best of both worlds in testing. A single service can connect to backend overlay networks and also publish ports on the host NIC and see real client IP’s incoming on the host IP. I’m using that with traefik reverse proxy to log client IP traffic in traefik that is destined for backend services. I feel like this could solve most requests I’ve seen for “logging the real IP”.
@PanJ does this solve it for you?
The key is to publish ports in
mode: host
rather thanmode: ingress
(the default).The pro to this mode is you get real client IP’s and native host NIC performance (since it’s outside IPVS encapulation AFAIK). The con is it will only listen on the node(s) running the replicas.
To me, the request of “I want to use ingress IPVS routing and also see client IP” is a different feature request of libnetwork.
to be fair - in k8s, it is possible to have a custom ingress. in swarm it is not.
swarm takes the stand that everything is “built-in”. Same is the case with networks - in k8s, you need to setup weave, etc… in swarm its built in.
so the point that andrey is making (and i kind of agree with ) is that - swarm should make this features as part of the ingress, since the user has no control over it.
On Sat, Jul 28, 2018 at 5:07 PM Seti notifications@github.com wrote:
@bluejaguar @ruudboon I am part of Docker. This is a well known issue. Right now the network team is focused on long standing bugs with overlay networking stability. This is why there haven’t really been new networking features in the last few releases.
My suggestion would be to come up with a concrete proposal that you are willing to work on to resolve the issue or at least a good enough proposal that anyone could take it and run with it.
Is anyone in the recent part of this thread here to represent the docker team and at least say that ‘we hear you’ ? Seems quite something that a feature you would expect to be ‘out of the box’ and of such interest to the community is still not resolved after being first reported August 9th 2016, some 18 months ago.
@sandys https://github.com/sandys The proxy protocol looks like encapsulation (at least at connection initiation), which requires knowledge of the encapsulation from the receiver all the way down the stack. There are a lot of trade-offs to this approach.
That is true. That’s pretty much why it’s a standard with an RFC. There’s momentum behind this though - pretty much every component importance supports it. IMHO it’s not a bad decision to support it.
I wouldn’t want to support this in core, but perhaps making ingress pluggable would be a worthwhile approach.
This is a larger discussion - however i might add that the single biggest advantage of Docker Swarm over others is that it has all batteries built-in.
I would still request you to consider proxy protocol as a great solution to this problem which has industry support.
The problem seems partially solved in
17.12.0-ce
by usingmode=host
.docker service create --publish mode=host,target=80,published=80 --name=nginx nginx
It has some limitations (no routing mesh) but works!
nginx supports IP Transparency using the TPROXY kernel module.
@stevvooe Can Docker do something like that too?
That’s how HaProxy is solving this issue: http://blog.haproxy.com/2012/06/05/preserve-source-ip-address-despite-reverse-proxies/
@jerrac - as also explained here: https://github.com/newsnowlabs/docker-ingress-routing-daemon/issues/24#issuecomment-1157077824 :-
To be clear, DIND exists to transform Docker’s ingress routing mesh to use policy routing instead of SNAT, to redirect client traffic to service nodes. It will only work to preserve the client IP if incoming requests directly reach a load-balancer node on a port published for a service via the ingress routing mesh. DIND is a network-layer tool (IPv4) and cannot inspect or modify HTTP headers.
I understand Traefik has often been used as a reverse proxy to work around the same limitation as DIND. In this model, incoming requests much directly reach the reverse proxy, which presumably must not be using the ingress routing mesh, but instead have its ports published using host mode, and be launched using
--mode global
. The Traefik reverse proxy will see the client IP of requests and can add these to the XFF header before reverse proxying them to an internal application service.DIND therefore exists to solve a similar problem as a Traefik reverse proxy service placed in front of an internal application service, but without the need for the extra Traefik service (or for proxying, or for introduction/modification of XFF headers) and therefore without modification of the application service (if it doesn’t natively support XFF headers).
Combining DIND with Traefik should allow Traefik itself to be deployed using the ingress routing mesh, which could be useful if Traefik is providing additional benefits in one’s setup.
However, I’m not sure I can see a use-case for combining DIND with an internal application service published via the ingress routing mesh, and still fronted by a Traefik reverse proxy. Since the reverse proxy node is the client for the internal application service request, doing this will just expose the Docker network IP of that node, instead of the ingress network IP, to the internal application service.
Hope this makes sense.
IIRC it was necessary to use the “long form” of the ports definition, like so:
any update?
@kaysond Not a good place to ask.
Your are essentially asking two questions,
Both of them is hard to answer in different ways.
I think you misunderstood my question. I understand why services would want to see the true source ip. I want to know why Docker changes it before it gets to a container
On Nov 1, 2019, 1:47 AM, at 1:47 AM, Daniele Cruciani notifications@github.com wrote:
In our case, it’s the combination of this workaround with the inability to bind a host-exposed port to a specific IP address. Instead, all internal services that need the real visitor’s IP and support PROXY protocol, have their port exposed on
0.0.0.0
on the host which is less than optimal.Another one is the non-negligible performance hit when you have hundreds of new connections per second - all the exposed ports are actually DNAT rules in iptables that require
conntrack
and have other problems (hits k8s too, but Swarm has this addidional level of NATs that make it worse).You can try to set another Nginx server outside the docker swarm cluster, and forward request to the swarm service. in this Niginx conf just add the forward headers. eg. location / { proxy_pass http://phpestate;
It seems there is no solution to get real client ip in the docker swarm mode.
It’s not a solution by itself, but can be used (and is being used) very successfully as a workaround. You can still use Docker’s native load balancer - all you’re doing is adding a layer to the host network stack before you hit Docker’s service mesh.
@ajardan that solution I have tried and is not viable for me as I more than a single host to respond on the frontend. Ideally I want the entire swarm to be able to route the requests. I agree that for small scale operations simply flipping one service to
host
mode and using it as an ingest server can work fine.Placing something like traefik in host mode negates the benefits we are trying to take advantage of from using swarm though in most cases 😦
It would really be good to get this statement (“won’t fix”) so I can fully justify a migration to kubernetes. Such a shame.
Thanks.
@thaJeztah Can someone on the Docker Inc team update us on the status of this issue. Is it still being considered and/or worked on ? Any ETA ? Or is this completely ignored since Docker integration with Kubernetes ? It has been reported almost 3 years ago 😕
I have filed a feature request for proxy protocol support to solve the issue in this bug.
Just in case anyone wants to add their comments.
https://github.com/moby/moby/issues/39465
On Wed, 10 Apr, 2019, 21:37 Daniele Cruciani, notifications@github.com wrote:
As an alternative, can’t swarm just take the original source ip and create
the right solution here is proxy protocol injected at L4 . there are some relevant pro and con discussions in Envoy for the same usecase https://github.com/envoyproxy/envoy/issues/4128 and https://github.com/envoyproxy/envoy/issues/1031
On Wed, Apr 10, 2019 at 1:40 AM Sebastiaan van Stijn < notifications@github.com> wrote:
There are lots of features in kubernetes that are not in swarm, and vice versa. We all make decisions on which orchestrator to use for a specific solution based on many factors, including features. No one tool solves all problems/needs.
I’m just a community member trying to help. If you don’t like the current solutions for this problem, then it sounds like you should look at other ways to solve it, possibly with something like kubernetes. That’s a reasonable reason to choose one orchestrator over another if you think the kubernetes way of solving it is more to your liking.
Historically, the moby and swarm maintainers don’t close issues like this as wontfix because tomorrow someone from the community could drop a PR with a solution to this problem. Also, I think discussing the ways to work around it until then, are a valid use of this issue thread. 😃
While not a swarm maintainer, I can say that historically the team doesn’t disclose future feature plans beyond what PR’s you can currently see getting commits in the repos.
I’m still kind of surprised, why people think this is a bug. From my perspective even the statement moving to kubernetes is not an adequate answer. As I see kubernetes has exact the same problem/behavior. You either have an external LB, or use something like nginx ingress proxy which must run as daemonset. Please correct me if I am wrong, but we have the same exact situation here, but no prepared autosolution here. Somebody could check and pack my proposed tcp stream solution described above to get something like nginx proxy behavior. Just accept, that swarm needs to be customized by yourself
PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:
Well, Docker does not currently touch ingress traffic, so definitely at least not insignificant to add. Keep in mind also this is an open source project, if you really want something then it’s generally going to be up to you to implement it.
As far as i know, the difference is that even if you deploy such a loadbalancing service it will be ‘called’ from the swarmkit loadbalancer and so you loose the users ip. So you can not disable the swarmkit loadbalancer if not using hostmode.
@Mobe91 Try to recreate the swarm. I also had an error. After the re-init swarm, everything worked for me. My
docker-compose.yml
file:my
docker version
:Sorry for my English.
@kleptog Partially you can’t. It can’t avoid downtime while updating service.
Seems like something everyone would want at some point, and since using overlay networks together with bridge/host networking is not really possible, this is a blocker in cases when you really need the client IP for various reasons.
Client: Version: 17.12.0-ce API version: 1.35 Go version: go1.9.2 Git commit: c97c6d6 Built: Wed Dec 27 20:03:51 2017 OS/Arch: darwin/amd64
Server: Engine: Version: 17.12.1-ce API version: 1.35 (minimum version 1.12) Go version: go1.9.4 Git commit: 7390fc6 Built: Tue Feb 27 22:17:54 2018 OS/Arch: linux/amd64 Experimental: true
@sandys The proxy protocol looks like encapsulation (at least at connection initiation), which requires knowledge of the encapsulation from the receiver all the way down the stack. There are a lot of trade-offs to this approach.
I wouldn’t want to support this in core, but perhaps making ingress pluggable would be a worthwhile approach.
@cpuguy83 couldnt understand what you just meant.
Proxy protocol is layer 4. http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
http://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.1-accept-proxy
Did you mean there was a better way than proxy protocol ? that’s entirely possible and would love to know more in context of source ip preservation in docker swarm. However, Proxy Protocol is more widely supported by other tools (like nginx, etc) which will be downstream to swarm-ingress… as well as tools like AWS ELB which will upstream to swarm-ingress. That was my only $0.02
These are L7 protocols. Swarm ingress is L4. There is nothing being reinvented here, it’s all IPVS using DNAT.
A few concerns with proxy protocol:
Is it decoded by docker itself, or by the application? If we are relying on the application to implement proxy protocol, then this is not a general solution for all applications and only works for web servers or other application that implement proxy protocol. If docker unwraps the proxy protocol and translates the address, then it will also have to track the connection state and perform the inverse translation on outgoing packets. I’m not in favor of a web-specific solution (relying on proxy protocol in the application), as docker is useful for many non-web applications as well. This issue should be addressed for the general case of any TCP/UDP application - nothing else in docker is web-specific.
As with any other encapsulation method, there is also the concern of packet size/MTU issues. However, I think this is probably going to be a concern with just about any solution to this issue. The answer to that will likely be make sure your swarm network supports a large enough MTU to allow for the overhead. I would think most swarms are run on local networks, so that’s probably not a major issue.
@trajano - We know it works with host networking (which is likely what your compose solution is doing). However, that throws out all of the cluster networking advantages of swarm (such as load balancing).
this is a very critical and important bug for us and this is blocking our go-live with Swarm. We also believe proxy protocol is the right solution for this. Docker ingress must pass source ip on proxy protocol.
On twitter one of the solutions that has been proposed is to use Traefik as ingress managed outside of Swarm. This is highly suboptimal for us - and not an overhead that we would like to manage.
If the Swarm devs want to check out how to implement proxy protocol in Swarm-ingress, they should check out all the bugs being discussed in Traefik (e.g. https://github.com/containous/traefik/issues/2619)
@sandys I agree. Proxy protocol would be great idea. @thaJeztah @aluzzardi @mrjana could this issue get some attention please? There haven’t been any response from team for a while. Thank you.
Hi.
For the sake of understanding and completeness, let me summarize and please correct me if I’m wrong:
The main issue is that containers aren’t receiving original src-IP but swarm VIP. I have replicated this issue with the following scenario:
It seems:
When services within swarm are using (default) mesh, swarm does NAT to ensure traffic from same origin is always sent to same host-running-service? Hence, it’s loosing the original src-IP and replacing it by swarm’s service VIP.
Seems @kobolog https://github.com/moby/moby/issues/25526#issuecomment-258660348 and @dack https://github.com/moby/moby/issues/25526#issuecomment-260813865 proposals were refuted by @sanimej https://github.com/moby/moby/issues/25526#issuecomment-280722179 https://github.com/moby/moby/issues/25526#issuecomment-281289906 but, TBH, his arguments aren’t fully clear to me yet, neither I understand why thread hasn’t been closed if this is definitively impossible. @stevvooe ?
@sanimej wouldn’t this work?:
Wouldn’t an option to enable “reverse proxy instead of NAT” for specific services solve all this issues satisfying everybody?
On the other hand, IIUC, the only option left is to use https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node, which -again IIUC- seems to be like not using mesh at all, hence I don’t see the benefits of using swarm mode (vs compose). In fact, it looks like pre-1.12 swarm, needing Consul and so.
Thanks for your help and patience. Regards
Load balancing is done at L3/4. Adding an http header is not possible.
A fix will involve removing the rewrite of the source address.
How did that happen ?
@aluzzardi @mrjana Any update on this please ? A little bit of feedback from Docker would be very much appreciated.
Re: traefik - don’t you also have to deploy it as
global
? Maybe for a single node it doesn’t matter, but for multiple I don’t believe the mesh network will route traffic.@struanb Awesome! Thanks again for the workaround. I’ll start further discussions on the repository when appropriate to avoid derailing this bug report.
(1: It seems I copied the wrong line, I meant to refer to this line where 10.0.0.0/24 is hardcoded)
@Vaults Thanks for testing the daemon and for your feedback.
FYI We have now published our latest version 2.5.1 of the daemon at https://github.com/newsnowlabs/docker-ingress-routing-daemon. This version includes:
docker
failnet.ipv4.vs.conn_reuse_mode=0
,net.ipv4.vs.expire_nodest_conn=1
andnet.ipv4.vs.expire_quiescent_template=1
andiptables -t raw -I PREROUTING -p tcp -j CT --notrack
). These took us some days to work out and track down and I strongly recommend upgrading to use them if your load-balancer nodes might receive high traffic levels.@struanb Thank you for the workaround.
For some reason it fully works (after the changes mentioned below) except for all the containers on one machine. I’ve checked all the iptables on the host, ingress node, containers and they all seem to be pretty identical. The connection times out nevertheless. Maybe the packet keeps getting rerouted forever? I may resume testing further, but does anyone have some ideas for debugging?
I’ve also made a few changes for my situation, maybe I didn’t fully understand what is going on in the script, but for me these were necessary:
A long as people post about it and don’t work to fix it, we’ll see it. There is very little time currently going into swarm from anyone.
I think this could work for a lot of us that need a way out to get the real IP. Cloudfare can be adjusted as proxy or just DNS only. It fits perfectly for no Digital Ocean customers. It is the cleaner workaround until now. But I agree with @beornf, we need a real solution, without depending on Digital Ocean or Cloudfare to get this done.
Thanks!
@sebastianfelipe that’s a big claim after all these years. You sure you’re not using host mode or other workarounds in this thread?
I use a managed HAIP, but you could use something else in front of the swarm, a standalone nginx load balancer that points to the IPs of your swarm. https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/
In your swarm, the reverse proxy needs this:
If you are running a swarm, you will need a load balancer to round-robin the requests to your swarm (or sticky, etc).
So far, this architectural decision may seem like a “missing piece”, however, this adds flexibility by providing options and removing the need to disable inbuilt functionality to replace it for something more suitable to the application needs.
To bad this is still an open issue , sadly … it doesn’t look like it’s going to be fixed soon
https://github.com/docker/libnetwork
Out of curiosity… can some dev point me to the code that manages swarm networking?
I wonder where the best place ask these questions is because I am now very intrigued to read the history of those choices and how it all works so I can get some more context here.
Maybe this is a naive question, but why is it necessary to rewrite the source ip to begin with? Wouldn’t the traffic be returned via the interface’s default gateway anyways? Even if it came via the swarm load balancer, the gateway could just return it via the load balancer which already knows where the traffic came from…
Already done here: #39465
Please read the whole thread before commenting
@pattonwebz Host mode can be enabled for a service running multiple containers on multiple hosts, you can even do that with mode=global. Then traefik will run on all your swarm nodes and accept connections to specified ports, then route the requests internally to services that need to see these connections.
I used this setup with a service in global mode but limited to manager nodes, and it was working perfectly fine for tens of thousands of requests/s
I would be happy to elaborate if more details are required.
I’m also having the same problem but with haproxy. Though it’s ok to have proxy servers in host mode and HA using keepalived, the only missing part would be load balancing that I think is not much of an issue for a simple web proxy. Unless complicated scripts are included or proxy and backend are not on the same physical machine and network traffic is too high for one NIC and…
Each node in the swarm can run an instance of the reverse-proxy, and route traffic to the underlying services over an overlay network (but only the proxy would know about the original IP-address).
Make sure to read the whole thread (I see GitHub hides quite some useful comments, so you’ll have to expand those 😞);
See https://github.com/moby/moby/issues/25526#issuecomment-367642600;
X-Forwarded-For
is L7 protocol; Swarm ingress is L4, using IPVS with DNAT@sandys Something like this: https://gist.github.com/rubot/10c79ee0086a8a246eb43ab631f3581f
We switched to
proxy_protocol
nginx global stream instance host mode, which is forwarding to replicated application proxy_nginx. This works well enough for the moment.service global nginx_stream
service replicated nginx_proxy
@jamiejackson the “least bad” workaround we’ve found is using Traefik as a global service in host mode. They have a good generic example in their docs. We’ve seen some bugs that may or may not be related to this setup, but Traefik is a great project and it seems pretty stable on Swarm. There’s a whole thread on their issues page on it (that loops back here 😃 ), with similar workarounds: https://github.com/containous/traefik/issues/1880
Hope this helps. We also can’t use a solution that doesn’t allow us to check actual requester IPs so we’re stuck with this kludge fix until something changes. It seems like a pretty common need, for security reasons at least.
Not 100% sure on what you mean, but externally we use a DNS with an A record per cluster node. This provides cheap “balancing” without having an external moving part. When a client makes a request, they chose a random A record, and connect to 443 on one of the cluster nodes.
There, the reverse proxy that is running on that specific node and listening on 443 gets a native connection, including the actual client IP. That reverse proxy container then adds a header and forwards the connection to another internal container using the swarm overlay network (tasks.backend). Since it uses the tasks.backend target, it will also get a random A record for an internal service.
So in the strict sense, it is bypassing magic of the overlay network that redirects the connection. It instead kind of replicates this behavior with the reverse proxy and adds a header. The final effect is the same (in a loose sense) as the magic of the overlay network. It also does it in parallel to running the swarm, meaning I can run all my other services that do not require the client IP on the same cluster without doing anything else for those.
By no means a perfect solution but until a fix is made (if ever) it gets you by without external components or major docker configuration.
@sandys sure, here is an excerpt from our docker-compose with the relevant containers.
This is the reverse proxy docker-compose entry:
This is the backend service entry:
The target of the reverseproxy (the backend side) would be
tasks.backendservice
(which has A records for every replica). You can skip thenetworks
part if the backend service is on the default swarm overlay network.The
global
bit says "deploy this container exactly-once on every Docker swarm node. The portsmode: host
is the one saying “bind to the native NIC of the node”.Hope it helps.
@jamiejackson that’s where things will be a bit different. In our case we are running a server that hosts long-running SSL connections and a custom binary protocol underneath so HTTP proxies were not possible. So we created a simple TCP forwarder and used a “msgpack” header that we could unpack manually on the internal server.
I’m not super familiar with HTTP proxies but I suspect most of them would do the trick for you. 😕
@adijes, and other user who are facing this issue. You can bind the containers to the
bridge
network (as mentioned by some one in this thread).Our
frontend
is bind tobridge
and always stay in an exact host, whose IP is bind to our public domain. This enable it receive real user IP. And because it’s also bind todefault
network, it will be able to connect to backed services.You can also scale the
frontend
, as long as you keep it live in that only host. This make the host is a Single Point of Failure, but (I think) it’s OK for small site.Edited to add more information:
My nginx containers is behind https://github.com/jwilder/nginx-proxy, I also use https://github.com/JrCs/docker-letsencrypt-nginx-proxy-companion to enable SSL. The nginx-proxy is run via
docker run
command, not a docker swarm service. Perhaps, that’s why I got real IP from clients. Thebridge
network is required to allow my nginx containers communicate with nginx-proxy.FWIW, I’m using:
Above setup also works on another setup, which is running:
@dack Backends must know the proxy protocol. I think it solves most cases and at least you can lay a thin passthrough-like proxy that process the protocol header in front of your backends inside containers. Because of the lack of information is deadly issue, I believe it is necessary to solve it as fast as possible in advance of other neat solution.
It is really a pity that is not possible to get client’s IP. this makes not usable most of the docker swarm nice features.
On my setup the only way to get the client’s IP is to use
network_mode:host
and not use swarm at all.using
mode=host port publishing
or a traditionaldocker run -p "80:80" ...
did not workSome solutions were suggested in https://github.com/moby/moby/issues/15086 but the only solution that worked for me was “host” networking…
I’m running up against this issue again.
My setup is as follows:
I would like to deploy a stack to the swarm and have it listen on port 80 on the virtual IP without mangling the addresses.
I can almost get there by doing this: ports: - target: 80 published: 80 protocol: tcp mode: host
The problem here is that it doesn’t allow you to specify which IP address to bind to - it just binds to all. This creates problems if you want to run more than a single service using that port. It needs to to bind only to the one IP. Using different ports isn’t an option with DR load balancing. It seems that the devs made the assumption that the same IP will never exist on multiple nodes, which is not the case when using a DR load balancer.
In addition, if you use the short syntax, it will ignore the bind IP and still bind to all addresses. The only way I’ve found to bind to a single IP is to run a non-clustered container (not a service or stack).
So now I’m back to having to use standalone containers and having to manage them myself instead of relying on service/stack features to do that.
@blazedd Have you tried it? I’m getting external ip addresses when following @mostolog’s example.
Why not use IPVS route network to container directly? bind all swarm node’s overlay interface’s ips as virtual ips, use
ip rule from xxx table xxx
to make multi-gateway, then swarm nodes can route client to container directly(DNAT), without any userspace network proxy daemon(dockerd)mirroring the comment above - can proxy protocol not be used ? All cloud load balancers and haproxy use this for source ip preservation.
Calico also has ipip mode - https://docs.projectcalico.org/v2.2/usage/configuration/ip-in-ip - which is one of the reasons why github uses it. https://githubengineering.com/kubernetes-at-github/
@tonysongtl that’s not related to this issue
@tkeeler33
--opt encrypted
should not affect host-port mapping. The only purpose of encrypted option is to encrypt the vxlan tunnel traffic between the nodes. From docs : “If you are planning on creating an overlay network with encryption (–opt encrypted), you will also need to ensure protocol 50 (ESP) traffic is allowed.” Can you pls check your configurations to make sure ESP is allowed ? Also, the--opt encrypted
option is purely data-plane encryption. All the control-plane traffic (routing exchanges, Service Discovery distribution, etc…) are all encrypted by default even without the option.@hamburml - keep an eye on https://github.com/docker/docker/issues/30447 its an open issue/feature.
Sorry for double post… How can I use a stack file (yml v3) to get the same behaviour as when I would use
--publish mode=host,target=80,published=80
via docker service create?I tried
but that’s not working (used same pattern as in https://docs.docker.com/docker-cloud/apps/stack-yaml-reference/#/ports)
@mavenugo I updated to docker 1.13 today and used
mode=host
on my proxy service. Currently it works, Client IP is preserved, but I hope for a better solution 😃 Thanks for your work!Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn’t it? I don’t mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.
Sure and yes a doc update to indicate this behavior and the workaround of using the publish
mode=host
will be useful for such use-cases that fails in LVS-NAT mode.Fair enough I guess @mavenugo given we have an alternative now.
At the very least, can we amend the doc for 1.13 so it clearly state that when using docker services with the default ingress publishing mode, the source ip is not preserved and hint at using the host mode if this is a requirement for running the service ?
I think it will help people who are migrating to services to not being burnt by this unexpected behaviour.
@aluzzardi any update for us ?
@sanimej good idea could be add all IPs to X-Forwarded-For header if its possible then we can see all chain.
@PanJ hmm, and how your nignx standalone container communicate to swarm instance, via service name or ip? Maybe can share nginx config part where you pass it to swarm instance.
@sanimej I kinda saw how it works when I dug into the issue. But the use case (ability to retrieve user’s IP) is quite common.
I have limited knowledge on how the fix should be implemented. Maybe a special type of network that does not alter source IP address?
Rancher is similar to Docker swarm mode and it seems to have expected behavior. Maybe it is a good place to start.
@PanJ The way the published port of a container is accessed is different in swarm mode. In the swarm mode a service can be reached from any node in the cluster. To facilitate this we route through an
ingress
network.10.255.0.x
is the address of theingress
network interface on the host in the cluster from which you try to reach the published port.