vector: Missing label from docker_job source

Vector Version

0.11

Vector Configuration File

[sources.docker]
  type = "docker_logs"

[sinks.loki]
  encoding = "text"
  endpoint = "http://loki:3100"
  inputs = ["docker"]
  type = "loki"

  labels.instance = "{{ label.com.docker.swarm.task.id }}"
  labels.node = "{{ label.com.docker.swarm.node.id }}"
  labels.job = "{{ label.com.docker.swarm.service.name }}"

Expected Behavior

All Loki log lines have label.com.docker.swarm.task.id as the instance label

Actual Behavior

All Loki log lines for some task/instances only have the node and job labels

Example Data

Additional Context

Vector is running as a Docker Swarm 20.10 service, along with the docker_logs sources being monitored. Most tasks seem to have the instance/task label set properly, but on the ones I was interested in looking at (short lived, errored shortly after starting) only the job and node labels were there. Inspecting the task does show it as having the proper labels applied, so Vector is not picking it up or Loki is dropping it somehow.

References

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @nivekuil !

With the latest Vector it should be labels.compose_service = '{{ label."com.docker.compose.service" }}'. This was due to some changes to unify path names appearing in configuration with VRL. You can find more details here: https://vector.dev/highlights/2022-03-22-0-21-0-upgrade-guide/#path-syntax-changes

Hi @pgassmann . I moved it since the next release is on Tuesday and it seemed unlikely that this would get in as it hadn’t been assigned yet. I’ll see if it is feasible to address before then though. Thanks for the bump!

@pgassmann oops, sorry, those should have been single-quotes, I updated that comment.

Collecting logs directly from the Docker Engine is known to have performance problems for very large setups. If you have a large setup, please consider alternative collection methods, such as the Docker syslog or Docker journald driver drivers.

I could not find the source of “is known”. Which performance is affected and how? What is a large setup? many containers or many log lines?

How does vector get the logs? from the http api? How is reading the logs from journald better? https://docs.docker.com/engine/api/v1.41/#operation/ContainerLogs https://docs.docker.com/engine/api/v1.41/#operation/ContainerAttach

We use the HTTP ContainerLogs endpoint you linked. I’m actually not aware of what the limits of ingesting logs from the API are or why journald would be preferred. It looks like @binarylogic added that note to the docs in #4547, maybe he has some more details.