prometheus-engine: One or more TimeSeries could not be written: The new labels would cause the metric XXX to have over 100 labels.

Discovered this error message today while searching for the reason of some missing metrics:

ts=2023-06-19T14:27:27.934Z caller=export.go:827 level=error
component=gcm_exporter msg="send batch" size=200 err="rpc error: code =
InvalidArgument desc = One or more TimeSeries could not be written: The new
labels would cause the metric
prometheus.googleapis.com/container_spec_memory_reservation_limit_bytes/gauge
to have over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_last_seen/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_fs_reads_bytes_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_swap/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_limit_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_cpu_usage_seconds_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_mapped_file/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_working_set_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_working_set_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_swap_limit_bytes/gauge to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_last_seen/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_period/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_working_set_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_reservation_limit_bytes/gauge
to have over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_fs_reads_bytes_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_fs_writes_bytes_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_fs_reads_total/counter to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_mapped_file/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_max_usage_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_rss/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_period/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_swap/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_cpu_usage_seconds_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_start_time_seconds/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_cpu_system_seconds_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_swap/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_fs_reads_bytes_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_cpu_system_seconds_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_swap_limit_bytes/gauge to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_swap_limit_bytes/gauge to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_rss/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_memory_rss/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_memory_cache/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_reservation_limit_bytes/gauge
to have over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_reservation_limit_bytes/gauge
to have over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failcnt/counter to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_working_set_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_swap/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_cpu_user_seconds_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_usage_bytes/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_memory_reservation_limit_bytes/gauge
to have over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_cpu_user_seconds_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_spec_cpu_shares/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_swap/gauge to have over 100 labels.;
The new labels would cause the metric
prometheus.googleapis.com/container_memory_working_set_bytes/gauge to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_start_time_seconds/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_usage_bytes/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_fs_writes_total/counter to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_blkio_device_usage_total/counter to have
over 100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_start_time_seconds/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_cache/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_fs_writes_bytes_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_oom_events_total/counter to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failcnt/counter to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_failures_total/counter to have over
100 labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_usage_bytes/gauge to have over 100
labels.; The new labels would cause the metric
prometheus.googleapis.com/container_memory_cache/gauge to have over 100
labels.\nerror details: name = Unknown  desc = total_point_count:200
success_point_count:108  errors:{status:{code:3}  point_count:92}"

Any way to debug this?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19

Most upvoted comments

Errors are gone. Thanks a lot for the help @bwplotka and @pintohutch!

Yep - you are correct that dynamic relabeling is not supported for managed kubelet/cadvisor scraping, see #342.

Deleting the offending metric descriptors is indeed the recommended fix, see #325 for details.

Side-note: you could also get label information from the /api/v1/labels Prometheus API in GMP, e.g. https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus/api/v1/labels?match[]={__name__="container_spec_memory_reservation_limit_bytes"}

Actually, are you sure you need relabel automatic cadvisor metrics? I don’t think you have to. The dynamic labels which you don’t want probably came from your manual installation with this custom flag set to true. You might need to delete the existing metric though to get new ones going in the worst case (those from manual installation)

Hey @moritzschmitz-oviva,

You may find the GCP API explorer helpful here to poke around and catch problematic labels.

For example, using the metricDescriptors.list call with a query like the following could yield some useful results:

image

Hope this helps!

I wish the collector would spit out which scrape config is causing this.