vector: 401 Unauthorized for GCP related sinks after a while
After executing properly for a while on a GKE pod, both of my GCP related sinks (Stackdriver and Google Cloud Storage) start giving me 401 Unauthorized errors.
~The underlying librairies~ GcpAuthConfig/GcpCredentials don’t seem to be handling ~JWT~ access token renewal/refetch from the Metadata Server properly (since I’m not using a service account private key).
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Vector Version
From the Docker image timberio/vector:0.18.1-debian
0.18.1
Vector Configuration File
[sinks.stackdriver]
type = "gcp_stackdriver_logs"
inputs = [ "stdin" ]
log_id = "logid"
project_id = "projectid"
[sinks.stackdriver.resource]
type = "global"
[sinks.gcs]
type = "gcp_cloud_storage"
inputs = [ "stdin" ]
bucket = "bucket"
key_prefix = "%F/"
compression = "gzip"
storage_class = "COLDLINE"
[sinks.gcs.encoding]
codec = "ndjson"
Actual Behavior
After a few hours of execution, the ~JWT~ access token expires and every time Vector tries to flush to its GCP related sinks I get this from stderr:
ERROR sink{component_kind="sink" component_id=gcs component_type=gcp_cloud_storage component_name=gcs}:request{request_id=6}: vector::sinks::util::retries: Not retriable; dropping the request. reason="response status: 401 Unauthorized"
...
ERROR sink{component_kind="sink" component_id=stackdriver component_type=gcp_stackdriver_logs component_name=stackdriver}:request{request_id=192}: vector::sinks::util::sink: Response failed. response=Response { status: 401, version: HTTP/1.1, headers: {"www-authenticate": "Bearer realm=\"https://accounts.google.com/\", error=\"invalid_token\"", "vary": "X-Origin", "vary": "Referer", "vary": "Origin,Accept-Encoding", "content-type": "application/json; charset=UTF-8", "date": "Mon, 10 Jan 2022 00:10:14 GMT", "server": "ESF", "cache-control": "private", "x-xss-protection": "0", "x-frame-options": "SAMEORIGIN", "x-content-type-options": "nosniff", "accept-ranges": "none", "transfer-encoding": "chunked"}, body: b"{\n \"error\": {\n \"code\": 401,\n \"message\": \"Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.\",\n \"status\": \"UNAUTHENTICATED\",\n \"details\": [\n {\n \"@type\": \"type.googleapis.com/google.rpc.ErrorInfo\",\n \"reason\": \"ACCESS_TOKEN_EXPIRED\",\n \"domain\": \"googleapis.com\",\n \"metadata\": {\n \"method\": \"google.logging.v2.LoggingServiceV2.WriteLogEntries\",\n \"service\": \"logging.googleapis.com\"\n }\n }\n ]\n }\n}\n" }
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 26 (12 by maintainers)
Hey @bruceg . I appreciate this issues has been closed for some months now, however we are seeing this problem when trying to send data to our GCS bucket from Vector.
We have Vector running in Kubernetes, and we have found that we can send data to GCS for some time after restarting our pods, but after a while we start getting 401s (assuming because Vector has failed to refresh its token). Any possibility this problem has been reintroduced?
The Vector image we are using is timberio/vector:0.24.1-alpine
This is the error we’re seeing:
2022-10-04T02:34:38.228190Z ERROR sink{component_kind="sink" component_id=gcs component_type=gcp_cloud_storage component_name=gcs}:request{request_id=214}: vector::sinks::util::retries: Not retriable; dropping the request. reason="response status: 401 Unauthorized"My thoughts too. It has not happened again yet. I’ll keep en eye on it. You can close the issue if you want and I’ll reopen if it happens again and I manage to catch more data.
I don’t think it’s related since, in this case, the GCS and Stackdriver sinks are authenticating using a token grabbed from the GKE Metadata Server here: https://github.com/vectordotdev/vector/blob/316089253076748428012292be39771b197fc4ff/src/sinks/gcp/mod.rs#L60-L86
It’s kept up-to-date with the task: https://github.com/vectordotdev/vector/blob/ea0d002f4f26522764314f176af6a0e6c3adc28c/src/sinks/gcp/mod.rs#L131-L150
FYI: It has now been running in production for 2 days without any 401 error. It makes me think that it might be related to some sporadic and uncaught error with the Metadata Server.
Also, prior to the 401 errors I had, there are no signs in the logs that there had been a problem with the token renewal (nothing in the logs that says “Failed to update GCP authentication token.”)