aws-xray-daemon: Bash tools missing from 3.3.0 release

If one ECS Fargate container of a task has a health check configured, then all containers must have a health check. With 3.2.0, I could use true as dummy health check. With 3.3.0, this health check fails due to true no longer being available in the image.

It would be nice if there was an explicit health check command or so for the daemon.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 5
  • Comments: 28 (2 by maintainers)

Most upvoted comments

I don’t understand why there hasn’t been any movement on this issue. It’s been a year.

It’s crazy for the same tag on different repos to point to fundamentally different images. Nothing has to be re-invented here, the precedent is already there - you add a suffix to the tag according to what is it based on (exactly as @Project0 suggested) and push all of the tags to all of the repos, e.g.

  • 3.3.3 (based on Amazon Linux 2)
  • 3.3.3-slim (from scratch)
  • 3.3.3-debian, etc.

activity

@thomasritz here in #9 is my small experiment with sometihg closer to real health check

["CMD", "/xray", "--version", "||", "exit 1"]

Due to the lack of a health check for ECS, I just used true as a dummy health check like so in taskdef.json:

{
  "name": "xray-daemon",
  "healthCheck": {
    "command": ["true"]
  },
  "image": "amazon/aws-xray-daemon:3.3.2",
  ...
}

I’d prefer to have a real health check command if that’s possible.

Thanks for such a great tool. And thanks for your support.

My needs are simple. Just need bash. sh is sufficient.

Hey @awssandra , shipping the same image tag but as 2 different artefacts to 2 different registries seems a bit of anti-pattern to me. I would expect aws-xray-daemon:3.3.2 to be the same image regardless of the registry I pull it from.

I understand the reason for having 2 Dockerfiles, the amount of embedded tools the end user requires varies, but I would expect the tag to be consistent regardless of its location. I think the problem you are going to have is that folks hitting the Docker Hub rate limitting issue are just going to swap out the registries in their Task definitions and expect it to work. But as we’ve seen in this thread, their existing health checks will break when moving from Hub to ECR and they won’t know why.

If you look at some of the Docker official images the naming convention is x.y.z-baseimage when multiple base images are used:

  • Python has 3.9.4-buster and 3.9.4-alpine
  • Nginx has 1.20 and 1.20-alpine
  • Haproxy has 2.30.10 and 2.30.10-alpine

So my suggestion would be to keep DockerHub and ECR Public in sync, with aws-xray-daemon:3.3.2 being the same image on both registries. But at the same time create a 3.3.2-amazonlinux or 3.3.2-slim (or whatever naming convention suits) stored in both registries with more tools and the ability to use a healthcheck.

For example:

aws-xray-daemon:3.3.2 from Dockerfile aws-xray-daemon:3.3.2-amazonlinux from Dockerfile.amazonlinux

To follow up on @srprash’s comment, we are specifically considering migrating both the public ECR and DockerHub image to use the debian:bullseye-slim. It has all of the basic bash tools you’d expect but at half the size of amazonlinux. If anyone has any comments or concerns related to this, it would be great to hear now. Thanks!

Hi @cynicaljoy @thomasritz A real health check is on our backlog but unfortunately we may not be able to prioritize it for some time. However, we are working towards a base image with bash tool so that a workaround for health checks can be possible. Thanks for your patience.

I fully appreciate the teams effort to cut down the image size, but I too would appreciate input as to a real health check.

aws-xray-daemon:3.3.2 to be the same image regardless of the registry I pull it from

Already ran into this. Was confusing as I expected to be fixed, but was getting the same problems when using ECR, IIUC.

New release 3.3.2 has been completed. Will be updating the docs accordingly.

Hey CarlosDomingues,

Apologies for the issues.

We will be republishing the Dockerhub images using the base amazonlinux, and maintain this as the standard solution for Dockerhub users, as it was established as the standard from day one.

We will be updating guidance on this in our public documentation, as well as supplying build instructions for those who wish to incorporate the extra tools into their ECR images for healthchecks or other operational items.

I tried a different approach which gives me some idea that the service is currently running.

> xray
2023-11-27T19:17:32Z [Info] Initializing AWS X-Ray daemon 3.3.9
2023-11-27T19:17:32Z [Error] listen udp 127.0.0.1:2000: bind: address already in use

You can then grep this to verify that the service is already listening on that port:

xray | grep 'bind: address already in use' || exit 0

Usage in AWS CDK:

const xray = taskDefinition.addContainer("XRay", {
      image: cdk.aws_ecs.ContainerImage.fromRegistry("amazon/aws-xray-daemon"),
      cpu: 32,
      memoryReservationMiB: 256,
      essential: true,
      portMappings: [
        {
          containerPort: 2000,
          protocol: cdk.aws_ecs.Protocol.UDP,
        },
      ],
      // TODO: Replace with a better health check in the future.
      // https://github.com/aws/aws-xray-daemon/issues/119
      healthCheck: {
        command: [
          "CMD-SHELL",
          "xray | grep 'bind: address already in use' || exit 0",
        ],
      },
    });

(sorry if this was already suggested)

Doing something like ["CMD", "/xray", "--health-check"] would be pretty slick. Would continue to keep the image small and provide a consistent way for the CLI to health check itself.

for the scratch image (public.ecr.aws/xray/aws-xray-daemon:3.3.3): is there something that can be stubbed in that will work for the health check?

This works but I’m not sure how robust it is.

Just to bump this up. We have a process of hardening original images before we run them in our infrastructure. The public.ecr.aws/xray/aws-xray-daemon:3.3.3 contains three vulnerabilities which could be fixed by applying this to our dockerfile:

RUN apk update                             \
    curl    $(: 'Fixes ALAS2-2021-1693')   \
    rpm     $(: 'Fixes ALAS2-2021-1689')   \
    openssl $(: 'Fixes ALAS2-2021-1687')

but it fails with:

executor failed running [/bin/sh -c apk update curl $(: ‘Fixes ALAS2-2021-1693’) rpm $(: ‘Fixes ALAS2-2021-1689’) openssl $(: ‘Fixes ALAS2-2021-1687’)]: unable to find groups for spec root: invalid argument

Also, the fact that xray/aws-xray-daemon:3.3.3 is alpine-based in public.ecr.aws, but amazonlinux-based in DockerHub… that’s a huge antipattern.

Hi @ollypom @fosrias @thomasritz @CarlosDomingues We are looking into using a base image which is slim enough but also has the tools that were being used from amazonlinux. Can you provide a list of essential tools that you’d want the x-ray daemon images to have?

Thanks!