grpc-go: all SubConns are in TransientFailure, latest connection error:
Please answer these questions before submitting your issue.
What version of gRPC are you using?
1.18.0
What version of Go are you using (go version
)?
1.11.2
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
Ran gRPC requests in production through a local Envoy sidecar. This means that all connections are to a process over a local unix socket. WaitForReady was not set to true for these calls
What did you expect to see?
I’d expect it to be exceedingly rare for connections to a local unix sidecar to be in the TRANSIENT_FAILURE state, and if they were i’d expect to see a non-nil error message. We use whatever the default balancer implementation is, but we do pass in a custom dialer implementation:
envoySocketDialer := func(address string, timeout time.Duration) (net.Conn, error) {
return net.DialUnix("unix", nil, &net.UnixAddr{
Name: envoySocketPath,
Net: "unix",
})
}
options := []grpc.DialOption{
...
grpc.WithDialer(envoySocketDialer), // Override the dialer as a workaround for https://github.com/grpc/grpc-go/issues/2510
}
What did you see instead?
A small percentage (<1%) of RPCs failed with a “all SubConns are in TransientFailure, latest connection error: <nil>” error message. Without knowing much about the connection pooling implementation, it seems like the error should typically be non-nil there
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 23
- Comments: 25 (12 by maintainers)
Commits related to this issue
- client: reset backoff to 0 after a connection is established (#2669) #2663 #2636 — committed to grpc/grpc-go by menghanl 5 years ago
- client: reset backoff to 0 after a connection is established (#2669) #2663 #2636 — committed to grpc/grpc-go by menghanl 5 years ago
- client: reset backoff to 0 after a connection is established (#2669) #2663 #2636 — committed to grpc/grpc-go by menghanl 5 years ago
- *: bump gRPC and protobuf dependencies The goal is to remove almost all references to the golang.org/x/net/context package. github.com/gogo/protobuf => v1.2.1 google.golang.org/grpc => v1.19.1 githu... — committed to simonpasquier/prometheus by simonpasquier 5 years ago
- *: bump gRPC and protobuf dependencies (#5367) The goal is to remove almost all references to the golang.org/x/net/context package. github.com/gogo/protobuf => v1.2.1 google.golang.org/grpc => v... — committed to prometheus/prometheus by simonpasquier 5 years ago
- Use HTTP2MatchHeaderFieldSendSettings for incoming gRPC connections gRPC clients which wait until they receive a `SETTINGS` frame may not be able to connect to CRI-O because of a limitation to cmux: ... — committed to openSUSE/cri-o by saschagrunert 5 years ago
- Update gRPC to the latest version The mentioned issue in 3d04f3b61c955b43a59fcdd54792c3d7e01190ff is related to a limitation of cmux, where clients may block until they receive a `SETTINGS` frame. My... — committed to openSUSE/cri-tools by saschagrunert 5 years ago
- Use HTTP2MatchHeaderFieldSendSettings for incoming gRPC connections gRPC clients which wait until they receive a `SETTINGS` frame may not be able to connect to CRI-O because of a limitation to cmux: ... — committed to openshift-cherrypick-robot/cri-o by saschagrunert 5 years ago
- Use HTTP2MatchHeaderFieldSendSettings for incoming gRPC connections gRPC clients which wait until they receive a `SETTINGS` frame may not be able to connect to CRI-O because of a limitation to cmux: ... — committed to openshift-cherrypick-robot/cri-o by saschagrunert 5 years ago
- Use HTTP2MatchHeaderFieldSendSettings for incoming gRPC connections gRPC clients which wait until they receive a `SETTINGS` frame may not be able to connect to CRI-O because of a limitation to cmux: ... — committed to haircommander/cri-o by saschagrunert 5 years ago
- Upgrade magma's GRPC-go version to 1.25.0 (#901) Summary: Pull Request resolved: https://github.com/facebookincubator/magma/pull/901 In production we are seeing: all SubConns are in TransientFailure... — committed to magma/magma by deleted user 5 years ago
- Upgrade magma's GRPC-go version to 1.25.0 (#901) Summary: Pull Request resolved: https://github.com/facebookincubator/magma/pull/901 In production we are seeing: all SubConns are in TransientFailure... — committed to gjalves/magma by deleted user 5 years ago
still facing this issue
Run in to the same problem when upgrading to 1.18.0. 1.17.0 works fine.