pomerium: controlplane: dropping event due to full channel: session invalidated
What happened?
Some user sessions are returning Error 500 for all routes, others are intermittent.
What did you expect to happen?
Policy to proceed where allowed.
How’d it happen?
- Updated from Pomerium v15 > v18
- In testing post upgrade, second pomerium node appeared not be handling authorization properly, so I stopped all services on the second node.
- I am now running on a single node, node 1, and some users get error 500 for all routes.
- I get intermittent error 500s, noticed more often with streaming requests. Static sites will work or fail, but when failing a refresh will allow it through.
What’s your environment like?
- pomerium: 0.18.0-1658889797+89a105c8
- envoy: 1.21.3+4861429dfffb599f28b9399c34ea2a2c268bfb6d10aca0a53bc9b67d847a4595
- Pomerium console version: 0.18.0-1658971952 + 7b8e18a8 + 2022-07-27T21:30:12-04:00
- Server Operating System/Architecture/Cloud: 2x CentOS Stream on premise virtual machines Separate redis (did not see the redis deprecation) and postgres VMs
What’s your config.yaml?
authenticate_service_url: https://example.com
certificates:
- cert: /path/to/cert.pem
key: /path/to/key.pem
signing_key: SECRET
metrics_address: localhost:9999
http_redirect_addr: :80
idp_provider: google
idp_client_id: clientid
idp_client_secret: secret
idp_service_account: serviceaccount
idp_refresh_directory_timeout: 10m
idp_refresh_directory_internal: 20m
cookie_secret: cookiesecret
shared_secret: sharedsecret
databroker_storage_type: redis
databroker_storage_connection_string: redis://:@redis:6379
policy:
- from: https://console.example.com
to: https://127.0.0.1:8701
pass_identity_headers: true
allowed_groups:
- group@example.com
allowed_users:
- example@example.com
What did you see in the logs?
{
"level": "error",
"config_file_source": "/etc/pomerium/config.yaml",
"bootstrap": true,
"service": "identity_manager",
"error": "identity/oidc: user info endpoint: 401 Unauthorized: {\"error\":\"invalid_request\",\"error_description\":\"Invalid Credentials\"}",
"user_id": "xxx",
"session_id": "xxx",
"time": "2022-08-24T12:01:19-04:00",
"message": "failed to update user info, deleting session"
}
{
"level": "warn",
"event": {
"time": {
"seconds": 1661363152,
"nanos": 831339500
},
"message": "identity/oidc: user info endpoint: 401 Unauthorized: {\"error\":\"invalid_request\",\"error_description\":\"Invalid Credentials\"}",
"id": "identity_manager_last_user_refresh_errors"
},
"time": "2022-08-24T13:45:52-04:00",
"message": "controlplane: dropping event due to full channel"
}
{
"level": "info",
"service": "envoy",
"upstream-cluster": "",
"method": "POST",
"authority": "example.com",
"path": "/path/to/page",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"referer": "example.com",
"forwarded-for": "192.168.1.1",
"request-id": "xxx",
"duration": 9999.325248,
"size": 0,
"response-code": 500,
"response-code-details": "ext_authz_error",
"time": "2022-08-24T17:13:34-04:00",
"message": "http-request"
}
Additional context
For my user account the problems are intermittment and I can get past them.
For another user, if I impersonate them with the console I can replicate 100%.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (5 by maintainers)
If redis couldn’t keep up then it adds up that postgres seems to be acting better.
Also, I noticed in my original config copy
It should be interval and I missed that for who knows how long
I performed the redis flushall, waiting for a sync. getting a lot of 403 and
during a large user/group sync, I believe this will appear:
"allow-why-false":["groups-unauthorized","non-pomerium-route"]
in the past, the sync from Google takes a very long time before group policy works.