yet-another-cloudwatch-exporter: [BUG] ecs-svc discovery includes all services in a cluster, and metric labels come from an arbitrary service
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
Currently, if any service in an ECS cluster matches the searchTags in an ecs-svc discovery job, metrics for all services in that ECS cluster will be scraped.
When exportedTagsOnMetrics is specified, the tags from the matched service will be associated with metrics from all services. If multiple services in an ECS cluster match the searchTags, tags from an arbitrary service will be applied to all metrics.
Expected Behavior
If only one service in an ECS cluster match the searchTags (and the cluster itself does not match), only that single service should have its metrics scraped.
If multiple services match searchTags, the tags from each ECS service should be applied to its own metric
Steps To Reproduce
- Create an ECS cluster with two services, tagged with
Service=<something>andRole=<something different per service> - Configure
yaceto scrape ECS services withService=<something>andRole=<one value>, and to copy theServiceandRoletags to metrics:
discovery:
exportedTagsOnMetrics:
ecs-svc:
- Role
- Service
jobs:
- type: ecs-svc
regions:
- ap-southeast-2
searchTags:
- key: Service
value: <something>
- key: Role
value: <one of the role values>
length: 1200
period: 60
metrics:
- name: CPUUtilization
statistics: [Maximum]
nilToZero: true
- Check the metrics. Only the matching ECS service is present in
aws_ecs_svc_info, butaws_ecs_svc_cpuutilization_maximumincludes the other ECS service with the wrong tags (though thedimension_ServiceNamelabel is correct):
# HELP aws_ecs_svc_cpuutilization_maximum Help is not implemented yet.
# TYPE aws_ecs_svc_cpuutilization_maximum gauge
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="matching-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",region="ap-southeast-2",tag_Role="<one of the role values>",tag_Service="<something>"} 5.492108138082059
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="other-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",region="ap-southeast-2",tag_Role="<one of the role values>",tag_Service="<something>"} 13.422734092626918
# HELP aws_ecs_svc_info Help is not implemented yet.
# TYPE aws_ecs_svc_info gauge
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",tag_Role="<one of the role values>",tag_Service="<something>"} 0
- Remove
RolefromsearchTags
discovery: exportedTagsOnMetrics: ecs-svc: - Role - Service jobs:
- type: ecs-svc
regions:
- ap-southeast-2 searchTags:
- key: Service value: <something> length: 1200 period: 60 metrics:
- name: CPUUtilization statistics: [Maximum] nilToZero: true
- Check the metrics again. Both ECS services will be present in
aws_ecs_svc_infowith the correct tags (and in my case, the ECS cluster itself also matches), but theaws_ecs_svc_cpuutilization_maximummetrics all have the tags of an arbitrary matching service:
# HELP aws_ecs_svc_cpuutilization_maximum Help is not implemented yet.
# TYPE aws_ecs_svc_cpuutilization_maximum gauge
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="matching-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",region="ap-southeast-2",tag_Role="<other role value>",tag_Service="<something>"} 5.550455194040153
aws_ecs_svc_cpuutilization_maximum{account_id="1234567890",dimension_ClusterName="my-cluster-name",dimension_ServiceName="other-service",name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",region="ap-southeast-2",tag_Role="<other role value>",tag_Service="<something>"} 14.431799231013773
# HELP aws_ecs_svc_info Help is not implemented yet.
# TYPE aws_ecs_svc_info gauge
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:cluster/my-cluster-name",tag_Role="<cluster role>",tag_Service="<something>"} 0
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/matching-service",tag_Role="<one of the role values>",tag_Service="<something>"} 0
aws_ecs_svc_info{name="arn:aws:ecs:ap-southeast-2:1234567890:service/my-cluster-name/other-service",tag_Role="<other role value>",tag_Service="<something>"} 0
Anything else?
I tried updating the regex in pkg/services.go to also extract the ServiceName dimension (service/(?P<ClusterName>[^/]+)/(?P<ServiceName>[^/]+)). This seems to fix the first problem, as only metrics from matching services get scraped in that case, but it does not fix the second problem - if 2 or more services match searchTags then their metrics still all have the tag labels from a single service instead of their own tags.
In my account at least, it seems like the AWS/ECS metric dimensions are returned in the order [ServiceName, ClusterName]. The logic in getFilteredMetricDatas uses the last resource that matches the last dimension, even if earlier dimensions do not match the resource. I’m not sure what the impact would be if getFilteredMetricDatas was changed to only select resources that match all dimensions, it seems quite complex at the moment.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 15 (10 by maintainers)
I think we should, we’ve been using it internally for a while now, and got no complaints. @cristiangreco is out for vacation until next week, but +1. Also poking @kgeckhart for opinions
Sorry, I’m not comfortable granting that kind of read access into my AWS account. Thank you for offering to re-test though.