opentelemetry-collector-contrib: Prometheus Receiver - Some counter metrics dropped for unknown reason

Describe the bug Analyzing an opentelemetry collector setup and trying to create dashboards with collected metrics, I discovered some droped metrics for an unknown reason. I have a bunch of “dropped” metrics without any logs or traces. Eventualy, I have some logs info internal/metrics_adjuster.go:357 Adjust - skipping unexpected point {"kind": "receiver", "name": "prometheus", "type": "UNSPECIFIED"} So they seems dropped due to an unspecified type. I added some logs on metricFamily.go to vizualize their metada, and they are empty.

As exemple this list completly disapear between the receiver and the exporter :

otel-collector    | cortex_deprecated_flags_inuse_total {Metric: Type: Help: Unit:}
otel-collector    | cortex_experimental_features_in_use_total {Metric: Type: Help: Unit:}
otel-collector    | go_memstats_alloc_bytes_total {Metric: Type: Help: Unit:}
otel-collector    | go_memstats_frees_total {Metric: Type: Help: Unit:}
otel-collector    | go_memstats_lookups_total {Metric: Type: Help: Unit:}
otel-collector    | go_memstats_mallocs_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_admin_user_created_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_dashboard_snapshot_create_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_dashboard_snapshot_external_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_dashboard_snapshot_get_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_login_oauth_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_login_post_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_login_saml_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_models_dashboard_insert_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_org_create_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_response_status_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_user_signup_completed_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_user_signup_invite_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_api_user_signup_started_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_aws_cloudwatch_get_metric_data_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_aws_cloudwatch_get_metric_statistics_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_aws_cloudwatch_list_metrics_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_datasource_request_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_db_datasource_query_by_id_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_emails_sent_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_instance_start_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_page_response_status_total {Metric: Type: Help: Unit:}
otel-collector    | grafana_proxy_response_status_total {Metric: Type: Help: Unit:}
otel-collector    | http_request_total {Metric: Type: Help: Unit:}
otel-collector    | loki_logql_querystats_duplicates_total {Metric: Type: Help: Unit:}
otel-collector    | loki_logql_querystats_ingester_sent_lines_total {Metric: Type: Help: Unit:}
otel-collector    | process_cpu_seconds_total {Metric: Type: Help: Unit:}

I had to put custom logs to understand, and the seems dropped because the Type is unspecified and they have no metadata from metricFamily.go

For an unknown reason some other metrics ending with _total are working like :

node_cpu_seconds_total{cpu="0",mode="idle"}

Steps to reproduce I don’t know exactly… Try to scrap a grafana API on http://grafana:3000/metrics

What did you expect to see? Metrics should be kept internaly, then visible at the exporter side.

What did you see instead? No metrics at the exporter side, and probably dropped on metrics_adjuster.go (a log with metric name is definitely missing here)

What version did you use? 0.25

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 29 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@chzhuo @baez90 your problem is different. It’s related to this: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20518

Discusstion related here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21743

Workaround in 0.78.0: --feature-gates=-pkg.translator.prometheus.NormalizeName Will be reverted by default in 0.80.0

@gillg I am facing a similar issue with the prometheus receiver, did you find a workaround that worked for you?

Hello, unfortunately nothing for now. It’s definitely not systematic, not a majority of metrics, but present a lot in some contexts like grafana metrics.

We are seeing the same. Even on a counter as simple as this, which is present on the endpoint that the Prom receiver is scraping, it does not appear on the exporter side.

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 7368.08