grpc-go: all SubConns are in TransientFailure, latest connection error:

Please answer these questions before submitting your issue.

What version of gRPC are you using?

1.18.0

What version of Go are you using (go version)?

1.11.2

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

Ran gRPC requests in production through a local Envoy sidecar. This means that all connections are to a process over a local unix socket. WaitForReady was not set to true for these calls

What did you expect to see?

I’d expect it to be exceedingly rare for connections to a local unix sidecar to be in the TRANSIENT_FAILURE state, and if they were i’d expect to see a non-nil error message. We use whatever the default balancer implementation is, but we do pass in a custom dialer implementation:

	envoySocketDialer := func(address string, timeout time.Duration) (net.Conn, error) {
		return net.DialUnix("unix", nil, &net.UnixAddr{
			Name: envoySocketPath,
			Net:  "unix",
		})
	}
	options := []grpc.DialOption{
                ...
		grpc.WithDialer(envoySocketDialer), // Override the dialer as a workaround for https://github.com/grpc/grpc-go/issues/2510
	}

What did you see instead?

A small percentage (<1%) of RPCs failed with a “all SubConns are in TransientFailure, latest connection error: <nil>” error message. Without knowing much about the connection pooling implementation, it seems like the error should typically be non-nil there

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 23
  • Comments: 25 (12 by maintainers)

Commits related to this issue

Most upvoted comments

still facing this issue

Run in to the same problem when upgrading to 1.18.0. 1.17.0 works fine.