grpc-go: TLS failures in blocking Dial calls don't provide useful error messages

What version of gRPC are you using?

v1.9.2, but I’ve verified the same issue exists at the latest release (v1.11.3).

What version of Go are you using (go version)?

1.10

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

I’m using a blocking Dial (grpc.DialContext(..., grpc.WithBlock())) to better account errors to the dial phase vs RPC calls. One of the backends had an unexpected CN in its TLS certificate, so the connection was failing. The error message returned from DialContext was only context deadline exceeded, and the useful errors were printed out into the info log.

I expected a blocking dial to return some indication of why it had failed, instead of only timing out. In particular, including the most recent transport error in the returned error value would be very useful for reporting connection problems.

Based on the conversation in https://github.com/grpc/grpc-go/pull/1855 it sounds like this behavior is intentional (or, at least, expected). Is there any way to either get access to the transport-layer errors, or somehow propagate them up into the returned error value?

The very useful error printed to the logs: W0427 21:43:33.738993 35711 clientconn.go:1167] grpc: addrConn.createTransport failed to connect to {badname.example.com 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for goodname.example.com, not badname.example.com". Reconnecting...

The not very useful error returned from DialContext: W0427 21:43:33.773061 35711 handlers.go:180] Error dialing backend "badname.example.com": context deadline exceeded

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 16 (13 by maintainers)

Most upvoted comments

Is there any progress? The problem greatly complicates the analysis of connection problems.