prometheus: ErrInvalidSample is returned on duplicate labels even when the value is also duplicated
This is clearly a bug in DC/OS / Mesos, but it caused a regression when ignoring the duplicate label (as long as the value is also duplicated) is the correct thing to do.
What did you do? Upgraded from Prometheus v2.14.0 to v2.16.0.
What did you expect to see? Successful parsing of DC/OS metrics on <IP>:61091/metrics.
What did you see instead? Under which circumstances? Prometheus now returns ErrInvalidSample ‘label name “container_id” is not unique’, marks the target as down and fails to record metrics.
Environment DC/OS OSS 1.11.6
-
System information: Linux 4.19.95-coreos x86_64
-
Prometheus version: Before upgrade (working): prometheus, version 2.14.0 (branch: HEAD, revision: edeb7a44cbf745f1d8be4ea6f215e79e651bfe19) build user: root@df2327081015 build date: 20191111-14:27:12 go version: go1.13.4
After upgrade (duplicate label errors): prometheus, version 2.16.0 (branch: HEAD, revision: b90be6f32a33c03163d700e1452b54454ddce0ec) build user: root@7ea0ae865f12 build date: 20200213-23:50:02 go version: go1.13.8
-
Alertmanager version: N/A
-
Prometheus configuration file:
...
- job_name: master-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- master.mesos
refresh_interval: 30s
type: A
port: 61091
...
-
Alertmanager configuration file: N/A
-
Logs:
level=warn ts=2020-02-20T22:52:46.911Z caller=scrape.go:972 component="scrape manager" scrape_pool=agent-metrics target=http://10.30.32.205:61091/metrics msg="append failed" err="label name \"container_id\" is not unique: invalid sample"
- Metric The container_id label is duplicated, but so is the value (violent agreement).
# HELP net_rx_packets DC/OS Metrics Datapoint
# TYPE net_rx_packets gauge
net_rx_packets{cluster_id="33bced63-a344-4664-acac-c8f043f91da6",container_id="256aaa06-38d2-4164-b7ec-8619b628731f",container_id="256aaa06-38d2-4164-b7ec-8619b628731f",dcos_package_is_framework="false",dcos_package_name="cadvisor",dcos_package_version="0.3.0-0.27.2",dcos_service_name="cadvisor",executor_id="cadvisor.1f72bcf3-21e2-11ea-b36c-82d64e6d50ed",executor_id="cadvisor.1f72bcf3-21e2-11ea-b36c-82d64e6d50ed",executor_name="Command Executor (Task: cadvisor.1f72bcf3-21e2-11ea-b36c-82d64e6d50ed) (Command: sh -c '/usr/bin/cad...')",framework_id="0a9b8664-f7f6-41a4-93b1-01493ff62a49-0000",framework_id="0a9b8664-f7f6-41a4-93b1-01493ff62a49-0000",framework_name="marathon",framework_principal="dcos_marathon",framework_role="slave_public",hostname="10.30.35.218",mesos_id="1d24d8d1-52b7-4694-812c-d20a953180f9-S17",source="cadvisor.1f72bcf3-21e2-11ea-b36c-82d64e6d50ed",task_id="cadvisor.1f72bcf3-21e2-11ea-b36c-82d64e6d50ed",task_name="cadvisor"} 0
- Marathon JIRA bug: https://jira.d2iq.com/browse/MARATHON-2381
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (12 by maintainers)
You are right.
A better workaround:
NOTE: This is a workaround for John. I do not recommend running this for people who face the ‘label name "" is not unique: invalid sample’ error. Instead, you should fix the problematic exporter.