yet-another-cloudwatch-exporter: [BUG] ecs-svc discovery includes all services in a cluster, and metric labels come from an arbitrary service

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Currently, if any service in an ECS cluster matches the searchTags in an ecs-svc discovery job, metrics for all services in that ECS cluster will be scraped.

When exportedTagsOnMetrics is specified, the tags from the matched service will be associated with metrics from all services. If multiple services in an ECS cluster match the searchTags, tags from an arbitrary service will be applied to all metrics.

Expected Behavior

If only one service in an ECS cluster match the searchTags (and the cluster itself does not match), only that single service should have its metrics scraped.

If multiple services match searchTags, the tags from each ECS service should be applied to its own metric

Steps To Reproduce

  1. Create an ECS cluster with two services, tagged with Service=<something> and Role=<something different per service>
  2. Configure yace to scrape ECS services with Service=<something> and Role=<one value>, and to copy the Service and Role tags to metrics:
discovery:
  exportedTagsOnMetrics:
    ecs-svc:
    - Role
    - Service
  jobs:
  - type: ecs-svc
    regions:
    - ap-southeast-2
    searchTags:
    - key: Service
      value: <something>
    - key: Role
      value: <one of the role values>
    length: 1200
    period: 60
    metrics:
    - name: CPUUtilization
      statistics: [Maximum]
      nilToZero: true

  1. Check the metrics. Only the matching ECS service is present in aws_ecs_svc_info, but aws_ecs_svc_cpuutilization_maximum includes the other ECS service with the wrong tags (though the dimension_ServiceName label is correct):
# HELP aws_ecs_svc_cpuutilization_maximum Help is not implemented yet.
# TYPE aws_ecs_svc_cpuutilization_maximum gauge
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="matching-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",region="ap-southeast-2",tag_Role="<one of the role values>",tag_Service="<something>"} 5.492108138082059
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="other-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",region="ap-southeast-2",tag_Role="<one of the role values>",tag_Service="<something>"} 13.422734092626918
# HELP aws_ecs_svc_info Help is not implemented yet.
# TYPE aws_ecs_svc_info gauge
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",tag_Role="<one of the role values>",tag_Service="<something>"} 0
  1. Remove Role from searchTags

discovery: exportedTagsOnMetrics: ecs-svc: - Role - Service jobs:

  • type: ecs-svc regions:
    • ap-southeast-2 searchTags:
    • key: Service value: <something> length: 1200 period: 60 metrics:
    • name: CPUUtilization statistics: [Maximum] nilToZero: true
  1. Check the metrics again. Both ECS services will be present in aws_ecs_svc_info with the correct tags (and in my case, the ECS cluster itself also matches), but the aws_ecs_svc_cpuutilization_maximum metrics all have the tags of an arbitrary matching service:
# HELP aws_ecs_svc_cpuutilization_maximum Help is not implemented yet.
# TYPE aws_ecs_svc_cpuutilization_maximum gauge
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="matching-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",region="ap-southeast-2",tag_Role="<other role value>",tag_Service="<something>"} 5.550455194040153
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="other-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",region="ap-southeast-2",tag_Role="<other role value>",tag_Service="<something>"} 14.431799231013773
# HELP aws_ecs_svc_info Help is not implemented yet.
# TYPE aws_ecs_svc_info gauge
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:cluster/my-cluster-name",tag_Role="<cluster role>",tag_Service="<something>"} 0
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",tag_Role="<one of the role values>",tag_Service="<something>"} 0
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",tag_Role="<other role value>",tag_Service="<something>"} 0

Anything else?

I tried updating the regex in pkg/services.go to also extract the ServiceName dimension (service/(?P<ClusterName>[^/]+)/(?P<ServiceName>[^/]+)). This seems to fix the first problem, as only metrics from matching services get scraped in that case, but it does not fix the second problem - if 2 or more services match searchTags then their metrics still all have the tag labels from a single service instead of their own tags.

In my account at least, it seems like the AWS/ECS metric dimensions are returned in the order [ServiceName, ClusterName]. The logic in getFilteredMetricDatas uses the last resource that matches the last dimension, even if earlier dimensions do not match the resource. I’m not sure what the impact would be if getFilteredMetricDatas was changed to only select resources that match all dimensions, it seems quite complex at the moment.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (10 by maintainers)

Most upvoted comments

I think we should, we’ve been using it internally for a while now, and got no complaints. @cristiangreco is out for vacation until next week, but +1. Also poking @kgeckhart for opinions

Sorry, I’m not comfortable granting that kind of read access into my AWS account. Thank you for offering to re-test though.