datadog-agent: Datadog Agent v6 Package is huge, and contains files that probably shouldn't be included

Describe what happened:

We started installing the datadog-agent v6 from the deb repo; specifically is what is in our sources list:

https://apt.datadoghq.com/ stable 6

After including a change to install the datadog-agent in our finalized docker image, we realized our image ballooned from 400mb to ~900mb. This is a huge jump for something we only use to monitor our one app (envoy). We also noticed by poking around in the file tree at: /opt/datadog-agent/ There seem to be a lot of dependencies that really should only be used in a development settings. Some examples of these include:

Linting for python (shouldn’t it have already passed linting at this point?):

/opt/datadog-agent/bin/pylint
/opt/datadog-agent/bin/epylint

Native build tools (are we really live compiling C on our boxes?):

/opt/datadog-agent/embedded/bin/automake
/opt/datadog-agent/embedded/bin/autoconf
/opt/datadog-agent/embedded/bin/compile_et
/opt/datadog-agent/embedded/share/aclocal
/opt/datadog-agent/embedded/share/autoconf
/opt/datadog-agent/embedded/share/pkgconfig

I’m sure you can find more example of things that are embedded that are probably never used. Moreover it looks like it includes every package it could ever need in build + deploy (keeping it’s own copy of python/openssl/etc.)

Describe what you expected:

  • I expected the final package to be smaller by default so it doesn’t blow up my image size.
  • I didn’t expect it to be adding tools to my system that increase the scope of things I have to watch from a security perspective that are not used.
  • I would hope for the ability to build my own version of the package that could utilize system dependencies + only include the deps needed for a specific extension (or set of extensions).

Steps to reproduce the issue:

  • Install the package from the debian repo.

Additional environment details (Operating System, Cloud provider, etc):

OS: Ubuntu 18.04 Cloud provider: AWS

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 7
  • Comments: 16

Most upvoted comments

Could this issue be re-opened? Three years later it is still very much a problem. I’d really rather not have my metrics agent using up more disk space than any other package on the system, including the main application. By a lot.

This is getting worse. The size of the v7 package is now up to 1.1GB. Could this be re-opened please?

That’s useful for deployment situations where you can have multiple containers running either on the same machine or networked machines. In my case I am able to deploy a container. A single container. Any dependencies must be included in that container.

Since my container already includes python, it seems like there no reason for dd-agent to install a separate python installation. The same goes for any of the “embedded” software in the dd-agent.

It would be nice to be able to install JUST the dd-agent and NONE of the embedded elements. Just document what those embedded dependencies are and let us install them ourselves as needed. Like if there’s features I don’t need I shouldn’t have to install them… right? I think that’s a fairly reasonable approach.

Hi! Just wanted to see if anybody’s heard any movement on this issue; we’re seeing the same thing when including datadog in our builds—over half our final build size is from datadog!

Simply installing dd-agent is HUGE:

du -hs /opt/datadog-agent
788M	/opt/datadog-agent

There’s a whole python3.8 installation in there! 😮 This has made my container image waaaay too big. Is there a solution for this?

For reference this is what I did in my Dockerfile:

FROM python:3.9-slim

# datadog (https://app.datadoghq.com/account/settings#agent/debian)
RUN echo 'APT::Install-Recommends "false";' > /etc/apt/apt.conf.d/99no-install-recommends
RUN apt-get update \
    && apt-get install -y --no-install-recommends curl \
    && curl -o /install_script.sh -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh \
    && chmod +x /install_script.sh \
    && DD_API_KEY="fake" DD_INSTALL_ONLY="true" DD_AGENT_MAJOR_VERSION=7 bash -c /install_script.sh \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
   
[...]

AFAIK you can now try https://registry.hub.docker.com/r/datadog/serverless-init like in here. Idk if this can be used outside of serverless but seems like it should be possible.

I checked size:

# dockerfile
FROM ubuntu
COPY --from=datadog/serverless-init:1 /datadog-init /app/datadog-init

build:

docker build -f dockerfile -t datadog-init:dev .

result:

$ docker run -it --rm datadog-init:dev du -h /app/datadog-init
33M     /app/datadog-init

so this one seems small

Please open this!

For now to overcome this huge image issue with GCP we could remove agent from image and then

  1. change app so it will pass all logs to container stdout
  2. in Compute Engine use datadog agent image as kind of “sidecar” container that can access docker logs (using cloud-init and startup script that is running datadog agent container on instance start)
  3. In Cloud Run do not use datadog at all but use log sink to pass logs to datadog intake endpoint directly