moby: grpc: the connection is unavailable and load balancer broken

Description

Since last update (docker 1.13.0 to 1.13.1) i’ve had major inconsistencies.

The load balancing is now broken, 50% of the resquest end with a no route to host, the other 50% work.
We often get grpc: the connection is unavailable when trying to do a docker exec on some of the containers.
Tab when looking for container names is completely broken with nonsense Here i typed “docker service ps P” to search for containers starting with P

docker service ps P__docker_daemon_is_experimental: command not found rod___docker_daemon_is_experimental: command not found __docker_daemon_is_experimental: command not found

but as you can see below, the version of docker is not experimental.

Those 2 issue are new since we updated yesterday and makes the whole platform barely usable.

Steps to reproduce the issue:

Upgrade docker 1.13.0 to 1.13.1 in swarm mode

Describe the results you received:

No route to host 50% of the requests
grpc: the connection is unavailable for docker exec on lots of containers

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 116
 Running: 6
 Paused: 0
 Stopped: 110
Images: 133
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1509
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: bk34jzemg6u4eq7bdjqsq6u69
 Is Manager: true
 ClusterID: 0apmbfyv7tr52j046zpefpgpn
 Managers: 7
 Nodes: 7
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.0.0.6
 Manager Addresses:
  10.0.0.10:2377
  10.0.0.11:2377
  10.0.0.4:2377
  10.0.0.6:2377
  10.0.0.7:2377
  10.0.0.8:2377
  10.0.0.9:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.359 GiB
Name: SWLNCANLS01
ID: 5POZ:4Q7W:OKMN:BTKQ:B7K3:UPRO:J5PA:3QMA:KMAQ:DM6L:7RDW:2LHL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: 
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
 type=Small
 AzureType=D1_V2
 Name=Small01
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

We are hosted in Azure Hosts are on Ubuntu 16.04.2 LTS kernel 4.4.0-62-generic Fully up to date

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 3
Comments: 25 (4 by maintainers)

Most upvoted comments

Still around: Client: Version: 17.03.0-ce API version: 1.26 Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64

Server: Version: 17.03.0-ce API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64 Experimental: false

time="2017-03-22T17:43:14.260489169Z" level=error msg="Create container failed with error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:14.514016752Z" level=error msg="Handler for POST /containers/3ce7df11cccb723a0d632acb85e7721316dfd13ae24ecd3ec705d5357e4b7f09/start returned error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:26.567430637Z" level=error msg="stream copy error: reading from a closed fifo\ngithub.com/docker/docker/vendor/github.com/tonistiigi/fifo.(*fifo).Read\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/vendor/github.com/tonistiigi/fifo/fifo.go:142\nbufio.(*Reader).fill\n\t/usr/local/go/src/bufio/bufio.go:97\nbufio.(*Reader).WriteTo\n\t/usr/local/go/src/bufio/bufio.go:472\nio.copyBuffer\n\t/usr/local/go/src/io/io.go:380\nio.Copy\n\t/usr/local/go/src/io/io.go:360\ngithub.com/docker/docker/pkg/pools.Copy\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/pkg/pools/pools.go:60\ngithub.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/container/stream/streams.go:119\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2086" 
time="2017-03-22T17:43:26.567669433Z" level=error msg="Create container failed with error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:26.567693163Z" level=error msg="stream copy error: reading from a closed fifo\ngithub.com/docker/docker/vendor/github.com/tonistiigi/fifo.(*fifo).Read\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/vendor/github.com/tonistiigi/fifo/fifo.go:142\nbufio.(*Reader).fill\n\t/usr/local/go/src/bufio/bufio.go:97\nbufio.(*Reader).WriteTo\n\t/usr/local/go/src/bufio/bufio.go:472\nio.copyBuffer\n\t/usr/local/go/src/io/io.go:380\nio.Copy\n\t/usr/local/go/src/io/io.go:360\ngithub.com/docker/docker/pkg/pools.Copy\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/pkg/pools/pools.go:60\ngithub.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/container/stream/streams.go:119\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2086" 
time="2017-03-22T17:43:26.810115561Z" level=error msg="Handler for POST /containers/84c166a4a4da700779e076ff800c352b90da435d3eca9c474cf9096679c4fd21/start returned error: grpc: the connection is unavailable"

# docker info
Containers: 706
 Running: 685
 Paused: 0
 Stopped: 21
Images: 39
Server Version: 17.03.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1514
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 977c511eda0925a723debdc94d09459af49d082a)
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-66-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.797 GiB
Name: REMOVED
ID: 2L6N:3VHZ:ZW5O:2VL4:BBP2:EKNE:33VQ:BOOB:Z7D3:PHUF:WJ77:COVB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

DamionWaltermeyer on Mar 22, 2017

We are experiencing this issue running 1.13.1 on photon. Randomly the swarm loses connectivity to approximately half of the containers and makes the swarm useless. Let me know if I can provide anything to help with troubleshooting

thomsonac on Jun 14, 2017