moby: grpc: the connection is unavailable and load balancer broken
Description
Since last update (docker 1.13.0 to 1.13.1) i’ve had major inconsistencies.
-
The load balancing is now broken, 50% of the resquest end with a no route to host, the other 50% work.
-
We often get grpc: the connection is unavailable when trying to do a docker exec on some of the containers.
-
Tab when looking for container names is completely broken with nonsense Here i typed “docker service ps P” to search for containers starting with P
docker service ps P__docker_daemon_is_experimental: command not found rod___docker_daemon_is_experimental: command not found __docker_daemon_is_experimental: command not found
but as you can see below, the version of docker is not experimental.
Those 2 issue are new since we updated yesterday and makes the whole platform barely usable.
Steps to reproduce the issue:
- Upgrade docker 1.13.0 to 1.13.1 in swarm mode
Describe the results you received:
-
No route to host 50% of the requests
-
grpc: the connection is unavailable for docker exec on lots of containers
Output of docker version
:
Client:
Version: 1.13.1
API version: 1.26
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:50:14 2017
OS/Arch: linux/amd64
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:50:14 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 116
Running: 6
Paused: 0
Stopped: 110
Images: 133
Server Version: 1.13.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 1509
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: bk34jzemg6u4eq7bdjqsq6u69
Is Manager: true
ClusterID: 0apmbfyv7tr52j046zpefpgpn
Managers: 7
Nodes: 7
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.0.0.6
Manager Addresses:
10.0.0.10:2377
10.0.0.11:2377
10.0.0.4:2377
10.0.0.6:2377
10.0.0.7:2377
10.0.0.8:2377
10.0.0.9:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.359 GiB
Name: SWLNCANLS01
ID: 5POZ:4Q7W:OKMN:BTKQ:B7K3:UPRO:J5PA:3QMA:KMAQ:DM6L:7RDW:2LHL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username:
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
type=Small
AzureType=D1_V2
Name=Small01
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
We are hosted in Azure Hosts are on Ubuntu 16.04.2 LTS kernel 4.4.0-62-generic Fully up to date
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 3
- Comments: 25 (4 by maintainers)
Still around: Client: Version: 17.03.0-ce API version: 1.26 Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64
Server: Version: 17.03.0-ce API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: 60ccb22 Built: Thu Feb 23 10:57:47 2017 OS/Arch: linux/amd64 Experimental: false
We are experiencing this issue running 1.13.1 on photon. Randomly the swarm loses connectivity to approximately half of the containers and makes the swarm useless. Let me know if I can provide anything to help with troubleshooting