datadog-agent: Setup agent to support OLTP ingest does not work.

Output of the info page (if this is a bug)

Don't think this is a bug, just unsure how to configure correctly

Describe what happened:

This is running on Windows machine with agent installed (7.32.4.1) which reports in the logs

Listening for traces at http://localhost:8126

and our application is sending traces via APM currently.

I am trying to add support for open telemetry alongside the current APM stuff.

I followed the instructions here: https://docs.datadoghq.com/tracing/setup_overview/open_standards/#otlp-ingest-in-datadog-agent and added this config to the datadog.yml:

experimental:
  otlp:
    receiver:
      protocols:
        grpc:
        http:

When I restarted the agent I saw this in the logs:

2022-04-13 15:10:48 BST | TRACE | WARN | (pkg/util/log/log.go:630 in func1) | Unknown key in config file: experimental.otlp.receiver.protocols.grpc 2022-04-13 15:10:48 BST | TRACE | WARN | (pkg/util/log/log.go:630 in func1) | Unknown key in config file: experimental.otlp.receiver.protocols.http

I searched little and found this issue: https://github.com/DataDog/helm-charts/issues/529

which seems to imply this this feature is no longer experimental, however I was not able to get this to function. Things I tried:

using this config in the agent datadog.yml:

otlp:
  receiver:
    protocols:
      grpc:
      http:

but same error as above basically.

Setting the following environment variables (based on other issue)

OTEL_EXPORTER_OTLP_ENDPOINT to http://localhost:4317 DD_OTLP_HTTP_PORT to 4317 DD_OTLP_GRPC_PORT to 4318 OTLP_COLLECTOR to http://localhost:4317

but after restarting the agent I see no other messages apart from the:

Listening for traces at http://localhost:8126

which implies this did not work. I also tried the app just in case.

The app is a .net 6 web app with this configration:

 services.AddOpenTelemetryTracing(
                builder =>
                {
                    builder
                        .SetSampler(new AlwaysOnSampler())
                        .AddSource("MySource")
                        .SetResourceBuilder(
                            ResourceBuilder.CreateDefault()
                                .AddService(serviceName: "MyService", serviceVersion: "1.0.0"))
                        .AddOtlpExporter(config =>
                        {
                            config.Endpoint = new Uri("http://localhost:4317");
                        })
                        .AddNServiceBusInstrumentation();
                });

Describe what you expected:

That following the instructions for enabling OTLP ingest would work correctly, or some alternative instructions are available if the feature is not experimental any more.

Steps to reproduce the issue: As above

Additional environment details (Operating System, Cloud provider, etc):

locally hosted Windows Server 2019VM Datadog Agent 7.32.4.1 Datadog .NET tracer 64 bit 1.27.1

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 21 (3 by maintainers)

Most upvoted comments

For anyone reading this issue, I figured my issue out.

The traces were correctly sent to Datadog. My configuration (explained here: https://github.com/DataDog/helm-charts/issues/529#issuecomment-1099421478) works perfectly well.

I wasn’t finding my traces because I was looking for them in the env: staging in the APM UI of Datadog (where I can see all the other traces of my app), while they were reported in the env: none.

To fix this env issue, I configured the ResourceAttributes.DEPLOYMENT_ENVIRONMENT in my SdkTracerProvider instance:

val serviceName = "my-service"
val env = "staging"
val resource = 
  Resource
    .builder()
    .put(ResourceAttributes.SERVICE_NAME, serviceName)
    .put(ResourceAttributes.DEPLOYMENT_ENVIRONMENT, environment)
    .build()

...    

val tracerProvider = 
  SdkTracerProvider
   .builder()
   .addSpanProcessor(spanProcessor)
   .setResource(resource)
   .build()    

In my case the issue was the trace-agent (component that listens on 5003/tcp) is disabled by default when using the Datadog operator. Traces were making it to the external otlp endpoints (4317/4318) on the agent, then getting blackholed at the trace-agent port (5003).

Setting features.apm.enabled to true in my DatadogAgent manifest turned on the trace-agent container and fixed my issue.

@pj-datadog this may be helpful to document here for those of us that didn’t already have APM enabled: https://docs.datadoghq.com/opentelemetry/otlp_ingest_in_the_agent/?tab=host

imo it doesn’t really make sense that you can turn on the otlp receiver without also turning on the trace-agent.

@guizmaii currently you can set this manually via the environment variables in helm (https://docs.datadoghq.com/tracing/setup_overview/open_standards/otlp_ingest_in_the_agent/?tab=kuberneteshelm), but we are actively working on a dedicated configuration section to make this easier

@iamharvey Today there are two methods a customer can use to send their telemetry data to Datadog.

Method 1: OTLP Ingest in Datadog Agent - A way to send telemetry data from OTel SDKs directly to Datadog Agent,

Method 2: OTel Collector Datadog Exporter - A way to send telemetry data from OTel SDKs to OTel Collector, which exports the data to Datadog Backend via a Datadog Exporter.

If you are using OTel Collector Datadog Exporter method, the release (GAing OTLP Ingest) will not influence your use.

NOTE: I am happy to announce that OTLP Ingest in Datadog Agent is now GA/Stable with Datadog Agent version 7.35

hi folks, I’ve spent a good amount of time working on this topic recently and I’d like to share what I’ve found out here.

TL;DR; version of the issue: agent listens on 4317/4318(otlp receiver) correctly but the traces were not sent to the trace-agent (on internal port 5003)

my setup:

  • “dd-agent”: version 7.35.0. otlp is enabled via environment variable: (tested with datadog.yaml as well, same result)
  • DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317
  • DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_TRANSPORT: tcp
  • I’ve verified that datadog is listening on port 4317 with netstat -apln | grep 4317
  • application: python flask + opentelemetry-python auto-instrumentation

before attempting using OTLP ingestion on datadog-agent, my application was auto-instrumented with opentelemetry-exporter-datadog (being deprecated): I was able to see APM traces on my datadog account

when I replaced opentelemetry-exporter-datadog with opentelemetry-exporter-otlp (pointing to the 4317 port), I lost APM traces on my datadog account. same API key was used compared to before and the rest of application code remains the same

some details for opentelemetry-exporter-otlp implementation:

  • environment variable OTEL_RESOURCE_ATTRIBUTES is set for my application:
    • OTEL_RESOURCE_ATTRIBUTES="service.name=my_app,deployment.environment=my_env,service.version=my_version"
  • compared to the implementation using opentelemetry-exporter-datadog:
    • service.name matches the value for DD_SERVICE
    • deployment.environment matches the value for DD_ENV
    • service.version matches the value for DD_VERSION
  • otlp_exporter = OTLPSpanExporter(endpoint="http://datadog:4317", insecure=True) (datadog is the hostname in the docker network)
  • span processor was configured (this is the same as using opentelemetry-exporter-datadog):
    • trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))
    • in addition to OTLP exporter, I also added a console exporter to debug
    • trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))

what I observed:

  • telemetry data (traces, auto-instrumentation as well as manual) are showing correctly in console output.
  • running tcpdump inside datadog agent container: tcpdump -nnA -i any "port 4317" -s 0: I was able to see the correct data in plain text coming to the 4317 port
  • verified that agent is connected to trace-agent on port 5003(internal grpc server for agent to export traces to trace-agent receiver)
  • running tcpdump inside datadog agent container: tcpdump -nnA -i any "port 5003" -s 0: I don’t see the telemetry data.

since the agent ingest otlp and export to trace-agent in otlp, I tried pointing my application otlp exporter to port 5003 instead and this failed too.

I’ve set DD_LOG_LEVEL to trace and inspected the /var/log/datadog/agent.log and /var/log/datadog/trace-agent.log but nothing looks suspicious. the only thing related is a “connection refused” error message when agent is trying to connect to trace-agent before trace-agent starts listening on the internal grpc port.

would be great to have a working example published from DataDog to help us understand what’s missing for trace