pomerium: proxy: grpc client should retry connections to services on failure

Describe the bug

I restarted pomerium, tried to login with my user and got a 500 error. After refreshing the page, I’m correctly logged in.

To Reproduce Steps to reproduce the behavior:

  1. Restart pomerium with a fresh set of secrets (to ensure user has to log again)
  2. Go to a protected service and log in
  3. Saw 500 error

Logs of the proxy:

{"level":"error","fwd_ip":"10.4.0.1","ip":"10.4.0.42","user_agent":"Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0","referer":"https://accounts.google.com/signin/oauth/oauthchooseaccount?client_id=XXXXXXXXX&flowName=GeneralOAuthFlow","req_id":"017ee31d-aad7-5207-a989-a834895ca395","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.7.240.43:443: i/o timeout\"","time":"2019-03-25T10:00:16Z","message":"proxy: error redeeming authorization code"}

There is no error in the authenticate service.

Expected behavior

User should be able to login at any time 😃

Environment:

  • Pomerium version (retrieve with pomerium --version): v0.0.2+45e6a8d
  • Server Operating System/Architecture/Cloud: GKE / GSuite

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

@travisgroth @desimone this is fixed in v3.0.0, thanks for your patience.

@desimone so I finally got back to this.

The problem is still present:

  • I generate new cookie/shared secret to be sure this is behaving as a fresh start
  • I (re)start all the services
  • I connect to a protected app
  • I get redirected to the authenticate service which redirect me to google signin
  • I get redirected back to the protected app and the error below appears
  • I can click on session and see my session
  • I can refresh the page and access the protected app

image

Furthermore:

  • if I remove the cookies and reconnect to the app, I can signin and get redirected successfully to the app.
  • if I restart the service (but do not delete the cookies), I can signin and get redirected successfully to the app.
  • if I restart the service and delete the cookies, I can signin but get the error above.

@victornoel We started handling transient GRPC issues a bit more gracefully with #261. Can you check on reproducing when you have a moment?

@desimone I will try v0.0.4 very soon and get back to you on this

@desimone I’m out of my depth in there 😃 I was hoping for some kind of very simple solution that would retry once on connection failure in this situation or something like that.

Let’s also note that this is not a severe bug, even though it’s not nice to experience.