moby: Swarm worker cannot connect to master if proxy is configured

Description

Beginning with Docker CE 17.09, a Swarm worker connects to its master through a configured proxy. Unfortunately this fails because Docker stores the master IP addresses instead of hostnames so the master doesn’t match the values configured in NO_PROXY.

Steps to reproduce the issue:

  1. Setup Docker as usual. Add a proxy configuration like this to /etc/systemd/system/docker.service.d/proxy.conf:

    [Service]
    Environment="HTTP_PROXY=http://proxy.internal.company.com:8080" "HTTPS_PROXY=http://proxy.internal.company.com:8080" "NO_PROXY=localhost,127.0.0.1,.internal.company.com"
    

    Don’t forget to systemctl daemon-reload and restart the Docker daemon.

  2. Initialize swarm on the master node

    master# docker swarm init
    Swarm initialized: current node (0tf92vjqqivr9mujdd99ip6ob) is now a manager.
    
    To add a worker to this swarm, run the following command:
    
        docker swarm join --token SWMTKN-1-20n86wgp1jlfdku52x5pzlcwc6youavmzm78junqe7wtxqg4t5-2orbo6etiiknwjcfj8g52chtr 192.168.1.11:2377
    
    To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
    
  3. Add a worker as indicated:

    worker# docker swarm join --token SWMTKN-1-20n86wgp1jlfdku52x5pzlcwc6youavmzm78junqe7wtxqg4t5-2orbo6etiiknwjcfj8g52chtr 192.168.1.11:2377
    

    Swarm join hangs because docker attempts to connect to the master through the proxy. Then it reports

    Error response from daemon: rpc error: code = Unavailable desc = grpc: the connection is unavailable
    

    Repeat with a FQDN and the join succeeds right away:

    # docker swarm join --token SWMTKN-1-20n86wgp1jlfdku52x5pzlcwc6youavmzm78junqe7wtxqg4t5-2orbo6etiiknwjcfj8g52chtr master.internal.company.com:2377
    This node joined a swarm as a worker.
    

    However, after a dockerd restart the worker canot connect to a master again.

Describe the results you received: See above.

Describe the results you expected: A swarm worker should either ignore the proxy settings, store the master FQDN instead of IP address, or have an option whether to use the proxy for master connections.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

# docker version
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:41:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:49 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.09.0-ce
Storage Driver: zfs
 Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
 Zpool Health: not available
 Parent Dataset: apppool/docker
 Space Used By Parent: 405504
 Space Available: 20671299584
 Parent Quota: no
 Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 6c839q9vfo18ucl8g7yykwukg
 Is Manager: false
 Node Address: 192.168.1.12
 Manager Addresses:
  192.168.1.11:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.5.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 39.12GiB
Name: worker
ID: PLND:NJOE:YUR6:XJLP:HYEM:IO4I:KS4G:5BYP:ZOOD:BEJH:G2FR:4VVE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.internal.company.com:8080
Https Proxy: http://proxy.internal.company.com:8080
No Proxy: localhost,127.0.0.1,.internal.company.com
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): OS is RHEL 7.4 running on vSphere 6.5

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 15 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Editing previous comment… The only way this seems to work is if all managers are present in the NO_PROXY setting. But, as someone else stated (I think in a related issue) this is pretty poor from a management standpoint as you have to restart all manager nodes if your manager participants change.

This approach seems very opinionated and rigid. Internal cluster communication between nodes has nothing to do with communication with the outside world. Therefore, I believe it makes a lot of sense to have separate ways to configure this.

I’m actually surprised this hasn’t been addressed yet.

using v 18.03

@thaJeztah I’m just saying that AFAICT this issue isn’t fixed in 17.12. Configure a swarm with a proxy and the masters will attempt to use the proxy to talk to each other, which fails. The reason the registry comes into it is that the proxy is configured because the swarm has no direct internet access, and so needs the proxy to talk to the registry, but not to each other (the proxy cannot connect to the swarm itself).

But like I said I haven’t personally tried it with 17.12, only 17.09.