pomerium: proxy: grpc client should retry connections to services on failure
Describe the bug
I restarted pomerium, tried to login with my user and got a 500 error. After refreshing the page, I’m correctly logged in.
To Reproduce Steps to reproduce the behavior:
- Restart pomerium with a fresh set of secrets (to ensure user has to log again)
- Go to a protected service and log in
- Saw 500 error
Logs of the proxy:
{"level":"error","fwd_ip":"10.4.0.1","ip":"10.4.0.42","user_agent":"Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0","referer":"https://accounts.google.com/signin/oauth/oauthchooseaccount?client_id=XXXXXXXXX&flowName=GeneralOAuthFlow","req_id":"017ee31d-aad7-5207-a989-a834895ca395","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.7.240.43:443: i/o timeout\"","time":"2019-03-25T10:00:16Z","message":"proxy: error redeeming authorization code"}
There is no error in the authenticate service.
Expected behavior
User should be able to login at any time 😃
Environment:
- Pomerium version (retrieve with
pomerium --version
): v0.0.2+45e6a8d - Server Operating System/Architecture/Cloud: GKE / GSuite
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 20 (9 by maintainers)
@travisgroth @desimone this is fixed in v3.0.0, thanks for your patience.
@desimone so I finally got back to this.
The problem is still present:
session
and see my sessionFurthermore:
@victornoel We started handling transient GRPC issues a bit more gracefully with #261. Can you check on reproducing when you have a moment?
@desimone I will try v0.0.4 very soon and get back to you on this
@desimone I’m out of my depth in there 😃 I was hoping for some kind of very simple solution that would retry once on connection failure in this situation or something like that.
Let’s also note that this is not a severe bug, even though it’s not nice to experience.