cloudwatch_exporter: Prometheus is failing to pull metrics from the latest image

Yesterday, one of my Cloudwatch exporter was restarted, and because all the available images are latest, my cluster pulled the image pushed 6 days ago to docker hub. The previous image was running without any issues.

screen shot 2018-06-25 at 10 36 27

Since the container restarted Prometheus is complaining about the format of the exporter.

Prometheus logs

time="2018-06-25T08:53:16Z" level=error msg="append failed" err="out of bounds" 
source="scrape.go:518" target="{__address__="10.101.90.4:9100", 
__metrics_path__="/metrics", __scheme__="http", endpoint="0", 
instance="10.101.90.4:9100", job="cloudwatch-exporter", 
namespace="monitoring", pod="cloudwatch-exporter-2869121175-zvctm",
 service="cloudwatch-exporter"}" 

Cloudwatch metrics endpoint

# HELP cloudwatch_requests_total API requests made to CloudWatch
# TYPE cloudwatch_requests_total counter
cloudwatch_requests_total 1645.0
# HELP aws_rds_free_storage_space_average CloudWatch metric AWS/RDS FreeStorageSpace Dimensions: [DBInstanceIdentifier] Statistic: Average Unit: Bytes
# TYPE aws_rds_free_storage_space_average gauge
aws_rds_free_storage_space_average{job="aws_rds",instance="",dbinstance_identifier="ooo",} 5.7720643584E10 1529914980000
aws_rds_free_storage_space_average{job="aws_rds",instance="",dbinstance_identifier="bbbb",} 2.6073378816E10 1529914980000
# HELP aws_ebs_burst_balance_average CloudWatch metric AWS/EBS BurstBalance Dimensions: [VolumeId] Statistic: Average Unit: Percent
# TYPE aws_ebs_burst_balance_average gauge
aws_ebs_burst_balance_average{job="aws_ebs",instance="",volume_id="vol-2222",} 100.0 1529914800000
# HELP aws_ec2_status_check_failed_average CloudWatch metric AWS/EC2 StatusCheckFailed Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_average gauge
aws_ec2_status_check_failed_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ec2_status_check_failed_instance_average CloudWatch metric AWS/EC2 StatusCheckFailed_Instance Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_instance_average gauge
aws_ec2_status_check_failed_instance_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ec2_status_check_failed_system_average CloudWatch metric AWS/EC2 StatusCheckFailed_System Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_system_average gauge
aws_ec2_status_check_failed_system_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ses_send_sum CloudWatch metric AWS/SES Send Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_send_sum gauge
aws_ses_send_sum{job="aws_ses",instance="",} 1781.0 1529828640000
# HELP aws_ses_delivery_sum CloudWatch metric AWS/SES Delivery Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_delivery_sum gauge
aws_ses_delivery_sum{job="aws_ses",instance="",} 1762.0 1529828640000
# HELP aws_ses_bounce_sum CloudWatch metric AWS/SES Bounce Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_bounce_sum gauge
aws_ses_bounce_sum{job="aws_ses",instance="",} 15.0 1529828640000
# HELP cloudwatch_exporter_scrape_duration_seconds Time this CloudWatch scrape took, in seconds.
# TYPE cloudwatch_exporter_scrape_duration_seconds gauge
cloudwatch_exporter_scrape_duration_seconds 2.595850326
# HELP cloudwatch_exporter_scrape_error Non-zero if this scrape failed.
# TYPE cloudwatch_exporter_scrape_error gauge
cloudwatch_exporter_scrape_error 0.0

Note: I’m using prometheus 2.0.0-alpha.2 Note 2: Can this project tag images before pushing to docker hub?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

I can confirm @gianrubio statement.

Now metrics collection is working again.

I also would suggest creating tags in docker hub like you do in github. Otherwise we can not specify images and will have breaking images in production.

I don’t think so, as I said before this only happen with the latest image. Rolling back to this commit fixed the issue.