opentelemetry-collector-contrib: Cannot get PrometheusRemoteWriteExporter to retry on failure
Hi,
Describe the bug
I have an issue with the prometheusremotewriteexporter
in that I cannot get it to retry on errors.
Every error during export is immediately treated as a permanent error and the data is dropped without any retry.
In my test-setup I am using promscale with timescaledb as the backend and sending metrics works fine.
Steps to reproduce
- Setup
prometheusremotewriteexporter
to send metrics to remote_write prometheus store, eg.promscale
. - Notice that no errors occur and the metrics are successfully transmitted
- Temporarily disable the promscale endpoint e.g. by by using an nginx reverse proxy in between and reconfiguring it to return Status Code 502 (Service Temporarily not available)
- Notice that otel-collector logs that the exporting failed, the error is not retryable and it’s dropping the data.
What did you expect to see?
I expected to see a log statement that exporting failed, a temporary error occurred and transmitting the data will be retried in soon, followed by a retry of the sending.
What version did you use?
opentelemetry-collector/0.56.0 Official Docker image: otel/opentelemetry-collector:latest ImageId: 1264713dde81
What config did you use?
The most basic version:
exporters:
prometheusremotewrite:
endpoint: "http://nginx:9201/write"
I also tried to enable retry_on_failure
manually, even though the default should be true anyway.
exporters:
prometheusremotewrite:
endpoint: "http://nginx:9201/write"
retry_on_failure:
enabled: true
Environment
Docker
Additional context
First I had the otel-collector exporting directly to promscale and then stopped the promscale docker container, but this also didn’t work.
Then I took care that this is not due to being identified as a Permanent Error by adding a nginx reverse proxy that allows me to return StatusCode 503 for testing purposes.
working nginx.conf in the beginning:
server {
listen 9201;
location / {
proxy_pass http://promscale:9201;
}
}
server {
listen 9202;
location / {
proxy_pass http://promscale:9202;
}
}
updated nginx.conf that returns service is temporarily unavailable:
server {
listen 9201;
location / {
return 503;
}
}
server {
listen 9202;
location / {
return 503;
}
}
After updating the nginx.conf, I instructed nginx to reload the config nginx -s reload
, which leads to the error:
nginx | 172.16.4.2 - - [08/Aug/2022:09:41:05 +0000] "POST /write HTTP/1.1" 503 197 "-" "opentelemetry-collector/0.56.0" "-"
collector-rs | 2022-08-08T09:41:05.280Z error exporterhelper/queued_retry.go:183 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 503 Service Temporarily Unavailable; err = %!w(<nil>): <html>\r\n<head><title>503 Service Temporarily Unavailable</title></head>\r\n<body>\r\n<center><h1>503 Service Temporarily Unavailable</h1></center>\r\n<hr><center>nginx/1.23.1</center>\r\n</body>\r\n</html>\r\n", "dropped_items": 6}
Do you have any pointers what I am doing wrong so that I could have broken this core feature? Thank you!
Take care, Martin
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (12 by maintainers)
Hi guys, I am trying to work on this. I’ve submitted a PR too. Thanks