opentelemetry-collector-contrib: Cannot get PrometheusRemoteWriteExporter to retry on failure

Hi,

Describe the bug

I have an issue with the prometheusremotewriteexporter in that I cannot get it to retry on errors. Every error during export is immediately treated as a permanent error and the data is dropped without any retry.

In my test-setup I am using promscale with timescaledb as the backend and sending metrics works fine.

Steps to reproduce

  1. Setup prometheusremotewriteexporter to send metrics to remote_write prometheus store, eg. promscale.
  2. Notice that no errors occur and the metrics are successfully transmitted
  3. Temporarily disable the promscale endpoint e.g. by by using an nginx reverse proxy in between and reconfiguring it to return Status Code 502 (Service Temporarily not available)
  4. Notice that otel-collector logs that the exporting failed, the error is not retryable and it’s dropping the data.

What did you expect to see?

I expected to see a log statement that exporting failed, a temporary error occurred and transmitting the data will be retried in soon, followed by a retry of the sending.

What version did you use?

opentelemetry-collector/0.56.0 Official Docker image: otel/opentelemetry-collector:latest ImageId: 1264713dde81

What config did you use?

The most basic version:

exporters:
  prometheusremotewrite:
    endpoint: "http://nginx:9201/write"

I also tried to enable retry_on_failure manually, even though the default should be true anyway.

exporters:
  prometheusremotewrite:
    endpoint: "http://nginx:9201/write"
    retry_on_failure:
      enabled: true

Environment

Docker

Additional context

First I had the otel-collector exporting directly to promscale and then stopped the promscale docker container, but this also didn’t work.

Then I took care that this is not due to being identified as a Permanent Error by adding a nginx reverse proxy that allows me to return StatusCode 503 for testing purposes.

working nginx.conf in the beginning:

server {
  listen 9201;

  location / {
    proxy_pass http://promscale:9201;
  }
}

server {
  listen 9202;

  location / {
    proxy_pass http://promscale:9202;
  }
}

updated nginx.conf that returns service is temporarily unavailable:

server {
  listen 9201;

  location / {
    return 503;
  }
}

server {
  listen 9202;

  location / {
    return 503;
  }
}

After updating the nginx.conf, I instructed nginx to reload the config nginx -s reload, which leads to the error:

nginx                 | 172.16.4.2 - - [08/Aug/2022:09:41:05 +0000] "POST /write HTTP/1.1" 503 197 "-" "opentelemetry-collector/0.56.0" "-"
collector-rs          | 2022-08-08T09:41:05.280Z        error   exporterhelper/queued_retry.go:183      Exporting failed. The error is not retryable. Dropping data.    {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 503 Service Temporarily Unavailable; err = %!w(<nil>): <html>\r\n<head><title>503 Service Temporarily Unavailable</title></head>\r\n<body>\r\n<center><h1>503 Service Temporarily Unavailable</h1></center>\r\n<hr><center>nginx/1.23.1</center>\r\n</body>\r\n</html>\r\n", "dropped_items": 6}

Do you have any pointers what I am doing wrong so that I could have broken this core feature? Thank you!

Take care, Martin

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (12 by maintainers)

Most upvoted comments

Hi guys, I am trying to work on this. I’ve submitted a PR too. Thanks