moby: grpc: the connection is unavailable and load balancer broken

Description

Since last update (docker 1.13.0 to 1.13.1) i’ve had major inconsistencies.

  • The load balancing is now broken, 50% of the resquest end with a no route to host, the other 50% work.

  • We often get grpc: the connection is unavailable when trying to do a docker exec on some of the containers.

  • Tab when looking for container names is completely broken with nonsense Here i typed “docker service ps P” to search for containers starting with P

docker service ps P__docker_daemon_is_experimental: command not found rod___docker_daemon_is_experimental: command not found __docker_daemon_is_experimental: command not found

but as you can see below, the version of docker is not experimental.

Those 2 issue are new since we updated yesterday and makes the whole platform barely usable.

Steps to reproduce the issue:

  1. Upgrade docker 1.13.0 to 1.13.1 in swarm mode

Describe the results you received:

  • No route to host 50% of the requests

  • grpc: the connection is unavailable for docker exec on lots of containers

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 116
 Running: 6
 Paused: 0
 Stopped: 110
Images: 133
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1509
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: bk34jzemg6u4eq7bdjqsq6u69
 Is Manager: true
 ClusterID: 0apmbfyv7tr52j046zpefpgpn
 Managers: 7
 Nodes: 7
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.0.0.6
 Manager Addresses:
  10.0.0.10:2377
  10.0.0.11:2377
  10.0.0.4:2377
  10.0.0.6:2377
  10.0.0.7:2377
  10.0.0.8:2377
  10.0.0.9:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.359 GiB
Name: SWLNCANLS01
ID: 5POZ:4Q7W:OKMN:BTKQ:B7K3:UPRO:J5PA:3QMA:KMAQ:DM6L:7RDW:2LHL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: 
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
 type=Small
 AzureType=D1_V2
 Name=Small01
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

We are hosted in Azure Hosts are on Ubuntu 16.04.2 LTS kernel 4.4.0-62-generic Fully up to date

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 3
  • Comments: 25 (4 by maintainers)

Most upvoted comments

Still around: Client: Version: 17.03.0-ce API version: 1.26 Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64

Server: Version: 17.03.0-ce API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64 Experimental: false

time="2017-03-22T17:43:14.260489169Z" level=error msg="Create container failed with error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:14.514016752Z" level=error msg="Handler for POST /containers/3ce7df11cccb723a0d632acb85e7721316dfd13ae24ecd3ec705d5357e4b7f09/start returned error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:26.567430637Z" level=error msg="stream copy error: reading from a closed fifo\ngithub.com/docker/docker/vendor/github.com/tonistiigi/fifo.(*fifo).Read\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/vendor/github.com/tonistiigi/fifo/fifo.go:142\nbufio.(*Reader).fill\n\t/usr/local/go/src/bufio/bufio.go:97\nbufio.(*Reader).WriteTo\n\t/usr/local/go/src/bufio/bufio.go:472\nio.copyBuffer\n\t/usr/local/go/src/io/io.go:380\nio.Copy\n\t/usr/local/go/src/io/io.go:360\ngithub.com/docker/docker/pkg/pools.Copy\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/pkg/pools/pools.go:60\ngithub.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/container/stream/streams.go:119\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2086" 
time="2017-03-22T17:43:26.567669433Z" level=error msg="Create container failed with error: grpc: the connection is unavailable" 
time="2017-03-22T17:43:26.567693163Z" level=error msg="stream copy error: reading from a closed fifo\ngithub.com/docker/docker/vendor/github.com/tonistiigi/fifo.(*fifo).Read\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/vendor/github.com/tonistiigi/fifo/fifo.go:142\nbufio.(*Reader).fill\n\t/usr/local/go/src/bufio/bufio.go:97\nbufio.(*Reader).WriteTo\n\t/usr/local/go/src/bufio/bufio.go:472\nio.copyBuffer\n\t/usr/local/go/src/io/io.go:380\nio.Copy\n\t/usr/local/go/src/io/io.go:360\ngithub.com/docker/docker/pkg/pools.Copy\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/pkg/pools/pools.go:60\ngithub.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1\n\t/usr/src/docker/.gopath/src/github.com/docker/docker/container/stream/streams.go:119\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2086" 
time="2017-03-22T17:43:26.810115561Z" level=error msg="Handler for POST /containers/84c166a4a4da700779e076ff800c352b90da435d3eca9c474cf9096679c4fd21/start returned error: grpc: the connection is unavailable" 
# docker info
Containers: 706
 Running: 685
 Paused: 0
 Stopped: 21
Images: 39
Server Version: 17.03.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1514
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 977c511eda0925a723debdc94d09459af49d082a)
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-66-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.797 GiB
Name: REMOVED
ID: 2L6N:3VHZ:ZW5O:2VL4:BBP2:EKNE:33VQ:BOOB:Z7D3:PHUF:WJ77:COVB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

We are experiencing this issue running 1.13.1 on photon. Randomly the swarm loses connectivity to approximately half of the containers and makes the swarm useless. Let me know if I can provide anything to help with troubleshooting