datadog-agent: Agent v6.5.2 broken logs from docker
Output of the info page (if this is a bug)
==============
Agent (v6.5.2)
==============
Status date: 2018-09-28 09:34:53.577571 UTC
Pid: 808
Python Version: 2.7.15
Logs:
Check Runners: 4
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -346µs
System UTC time: 2018-09-28 09:34:53.577571 UTC
Host Info
=========
bootTime: 2018-09-27 19:59:30.000000 UTC
kernelVersion: 3.10.0-693.2.2.el7.x86_64
os: linux
platform: centos
platformFamily: rhel
platformVersion: 7.4.1708
procs: 148
uptime: 11s
Hostnames
=========
hostname: k***s.com
socket-fqdn: h***2.hostwindsdns.com.
socket-hostname: h***2.hostwindsdns.com
hostname provider: configuration
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [OK]
Total Runs: 3,261
Metric Samples: 6, Total: 19,560
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
disk (1.3.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Total Runs: 3,261
Metric Samples: 58, Total: 175,170
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 40ms
file_handle
-----------
Instance ID: file_handle [OK]
Total Runs: 3,260
Metric Samples: 5, Total: 16,300
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
io
--
Instance ID: io [OK]
Total Runs: 3,261
Metric Samples: 26, Total: 84,768
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 3ms
load
----
Instance ID: load [OK]
Total Runs: 3,260
Metric Samples: 6, Total: 19,560
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
memory
------
Instance ID: memory [OK]
Total Runs: 3,261
Metric Samples: 17, Total: 55,437
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
network (1.6.1)
---------------
Instance ID: network:2a218184ebe03606 [OK]
Total Runs: 3,261
Metric Samples: 32, Total: 104,340
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 1ms
ntp
---
Instance ID: ntp:b4579e02d1981c12 [OK]
Total Runs: 3,261
Metric Samples: 1, Total: 3,261
Events: 0, Total: 0
Service Checks: 1, Total: 3,261
Average Execution Time : 31ms
uptime
------
Instance ID: uptime [OK]
Total Runs: 3,261
Metric Samples: 1, Total: 3,261
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
CheckRunsV1: 3,260
Dropped: 0
DroppedOnInput: 0
Errors: 81
Events: 0
HostMetadata: 0
IntakeV1: 249
Metadata: 0
Requeued: 87
Retried: 82
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 6,769
TimeseriesV1: 3,260
API Keys status
===============
API key ending in 24fb5 for endpoint https://app.datadoghq.com: API Key valid
==========
Logs Agent
==========
custom
------
Type: docker
Name: front-container
Status: Pending
=========
DogStatsD
=========
Checks Metric Sample: 533,936
Event: 1
Events Flushed: 1
Number Of Flushes: 3,260
Series Flushed: 420,505
Service Check: 29,401
Service Checks Flushed: 32,652
Describe what happened: Logs Agent dont collect logs, just pending.
Describe what you expected: Agent (v6.4.2) works as expected (installed on same server 4/9/2018 13:08)
custom
------
Type: docker
Name: front-container
Status: OK
Inputs: 99b00c7d6467b686ce83333dfb86e5297cd20cd1810b99e0ac32dd218cadade1
Steps to reproduce the issue: Agent (v6.4.2), Docker version 18.06.1-ce, build e68fc7a - working Agent (v6.5.2), Docker version 18.06.1-ce, build e68fc7a - not working
/etc/datadog-agent/conf.d/custom.yaml
logs:
- type: docker
name: front-container
source: nginx
service: docker
datadog.yaml
dd_url: https://app.datadoghq.com
api_key: a***5
hostname: k***s.com
tags:
- role:shop-front
logs_enabled: true
Additional environment details (Operating System, Cloud provider, etc):
Centos 7, Docker version 18.06.1-ce
Digital Ocean/Hostwindsdns - same problem
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 5
- Comments: 21 (7 by maintainers)
Also seeing a similar problem with
v6.5.2. We are using Docker labels on our app containers to configure the Datadog container agent on hosts running Docker version18.06.1-ce. Seems like Datadog is having a problem with parsing container labels, which didn’t change when we upgraded Datadog2018/10/05 additional info:
We are running our agent containers with the following environment variables:
Example of a container service with Docker labels:
Hello @jalessio ,
So it is indeed partially linked. The situation we had is that a badly formatted annotations/labels for metrics or logs was breaking the entire collection of data for that container. Now, logs and metrics annotations can fail independently and not block the other data type collection.
The next agent version 6.8 should solve all the issues raised in this thread.
ohh sorry it looks like the issue is solved… there was probably a wrong conf in the nginx image. thank you for the help !!
@btsuhako many thanks for the quick reply! I’ll try this out today.
@jalessio we’re successfully running the
6.7.0agent on AWS Linux 2 hosts. Logging and APM work as expected.Datadog Agent task definition -> https://gist.github.com/btsuhako/097a2e0d7932cca588cfcdcdf36dbb88
Sample ECS service task definition -> https://gist.github.com/btsuhako/33c1d3d6a2bbee52afa4cf92d3df1f6b
We build our Docker images without any labels, and apply the needed ones at runtime with the task definition. Note that we use only 1 label for our NodeJS application, and 4 labels for the
nginxreverse proxy sidecar.From @NBParis https://github.com/DataDog/datadog-agent/issues/2383#issuecomment-428104773, seems like you can use 1 label (
com.datadoghq.ad.logs) or all 4 (com.datadoghq.ad.instances,com.datadoghq.ad.check_names,com.datadoghq.ad.init_configs,com.datadoghq.ad.logs), but anything else in between may not function properly.Hello @undiabler, @btsuhako and @johanvereshchaga.
We have identified a potential issue which might explain the behaviour you observed. Is the Datadog Agent running on the host?
If yes, would you mind adding the following lines to
datadog.yamland let us know if after restarting the agent it then works fine:Why do we need this?
As explained in the previous post, the log collection was merged in the Autodiscovery feature of the Agent which means that we now need to have it enabled. The above lines enable the Autodiscovery feature in the agent.
This is not necessary when running the containerised version of the agent as it is enabled automatically.
Hello everyone,
Thanks for reporting this issue. We will definitely have a look, replicate and fix this behaviour.
There might however be some workaround until this is fixed. Indeed until the Agent version 6.5 it was required for Kubernetes to use configuration files to filter container by name or image.
As it is now possible to use the Autodiscovery feature with the agent, you can do the same configuration directly in container labels or pod annotations.
Examples: https://docs.datadoghq.com/logs/log_collection/docker/?tab=nginxdockerfile#examples
This means that you now have the ability to easily:
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALLenvironment variable.DD_LOGS_CONFIG_CONTAINER_COLLECT_ALLvariable and setting the labels or pod annotations on the container that should be collected.DD_AC_INCLUDEandDD_AC_EXCLUDEvariables (example).That said, previous configuration should still work so this definitely needs to be fixed. I just wanted to share the new behaviour which we believe is much more dynamic and flexible.
Sorry for the trouble caused and once again thanks for reporting it.