aws-xray-daemon: Single XRay Daemon in ECS Cluster not sending traces

Hello,

Just like #24 I am trying to deploy a single xray daemon to serve every service in my ECS Cluster.

If I deploy the agent as a container in the service everything works fine. If I try to deploy it as a separate service (behind Load Balancer and tied to Route53) my services are unable to send segments to the daemon.

I can correctly see

[Debug] Send xx telemetry record(s)

when the agent is inside the service configured to have AWS_XRAY_DAEMON_ADDRESS=xray-agent:2000

But if I try to reach it outside the service as a separate microservice I only get

[Debug] Skipped telemetry data as no segments found

using AWS_XRAY_DAEMON_ADDRESS=xray.myenvironment.mydomain:2000

Consider that that host is reachable from my local machine, from inside the ec2 host and from inside the specific container. So it’s not a network issue.

The task has the same IAM policy attached and the security group allows all my vpc to reach port 2000 via udp/tcp

 - PolicyName: xray-writeonly
          PolicyDocument:
            Statement:
              - Action:
                  - "xray:PutTraceSegments"
                  - "xray:PutTelemetryRecords"
                Effect: "Allow"
                Resource:
                  - "*"

And this is the CloudFormation template

- Name: xray-agent
          Essential: true
          Image: amazon/aws-xray-daemon
          Cpu: 32
          Command:
            - --log-level=dev
          Memory: 64
          PortMappings:
            - ContainerPort: 2000
              HostPort: 0
              Protocol: udp
          Environment:
            - Name: AWS_DEFAULT_REGION
              Value: !Ref "AWS::Region"
            - Name: AWS_REGION
              Value: eu-west-1
            - Name: AWS_SDK_LOAD_CONFIG
              Value: "1"
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref AWS::StackName
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: "xray-agent"

Considering it’s not outputting any error log it’s difficult for me to debug this.

thanks

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (7 by maintainers)

Most upvoted comments

From Comment - https://github.com/aws/aws-xray-daemon/issues/53#issuecomment-647820079

After discussing with the ECS team, they’ve recommended you should configure the X-Ray daemon using the daemon service scheduler type, as described in these docs: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html

This way, you don’t need to bother with the costs & configuration associated with deploying the daemon as a standalone service behind a load balancer. It deploys one daemon per container instance automatically, which means all your tasks will be able to communicate with the daemon on their instance using localhost as intended.

That method only appears to be appropriate if you are not using fargate.

As I am using fargate, what is the prefered strategy for Multi A-Z? My thoughts go down the route of creating another service with the xray-daemon as the container, and have at least 1 instance per Zone, but we would still need that behind a ELB incase the node in a specific zone died.

Does the ECS team have other recommendations for X-ray daemons with Fargate containers?

@KeynesYouDigIt I’d suggest taking a look at the example task definition I referenced above and ensuring you’re not using links. Another thing to check is that your task role has write permission for the X-Ray service. If you’re still having issues please open a separate GitHub issue and post:

  1. Your task definition
  2. Your X-Ray daemon logs
  3. Your application logs with X-ray debug mode enabled

@ryanmorseglu Unfortunately not.

@willarmiros I think the ticket should be reopened on the basis that it requires AWS to revisit the solution they recommended as it doesn’t work with fargate.

@mauroartizzu Glad to hear! I’ll close this issue then, feel free to recomment if you have further difficulties.

@willarmiros that was the 1st path I took and it worked seamlessly. The fact is we have hundreds of microservices and even keeping memory settings low will result in a hundred daemons. We tried but we were growing the number of necessary EC2 per cluster.

Anyway thank you a lot for the clarifications 😃 Now I surely have a better understanding on how the daemon works and this issue might serve as a solution for anyone else facing the same problem.