envoy: Missing "x-amzn-trace-id" (and other trace ids) in response headers

Title: Missing “x-amzn-trace-id” (and other trace ids) in response headers

Description:

This was originally reported against AWS AppMesh / Envoy for X-Ray integration (https://github.com/aws/aws-app-mesh-roadmap/issues/394), but further evaluation of the issue by @suniltheta showed that it’s an issue with any tracing extension, and results from how Envoy support for tracing extensions is built internally. Long story short: Envoy doesn’t add tracing headers to response if they’re missing. This way, proper serving on trace headers fully relies on whether the application container uses tracing SDK (e.g. XRAY SDK) or is otherwise configured to propagate tracing headers from Request to Response.

Repro steps:

  1. Deploy an ECS Fargate Service with Envoy, X-Ray Sidecar and raw, unconfigured nginx container acting as the “application container”.
  2. Configure Envoy to send Traces to X-Ray
  3. Send a bunch of requests to this service using curl -v ...
  4. Observe the following: 4.1. Traces of the requests are visible in X-Ray 4.2. Nginx sees the x-amzn-trace-id header, but is not configured to send it back (also, see “Why i think…” below) 4.3. The response - as seen by curl - doesn’t show the x-amzn-trace-id header.

In https://github.com/aws/aws-app-mesh-roadmap/issues/394 I present in much more details the configurations that were tested using ECS and AppMesh. @suniltheta also describes how he tested this with X-Ray and Jaeger

Why I think this is a bug?

In ideal scenario, where our “application container” is properly configured with XRAY SDK, or performs otherwise configured request-to-response header rewrite, all is fine. But in that ideal scenario, we rarely ever need the trace id. Things get ugly, when the requests start failing. Consider following scenarios:

  1. The “application service” times out and envoy responds with 504.
  2. The “application service” fails with some internal error and responds with a 5xx response code, but does not provide trace-id in the response.
  3. The “application service” contains a misconfigured XRAY SDK, and the tracing headers are not copied to response
  4. The “application service” is a 3rd party, closed-source app, which has no way of being configured to copy these headers.

In any of these cases the client is unable to provide us with any meaning trace-id to debug issues with the request. I believe it should be Envoy’s responsibility to - once it initiated new trace on the incoming request - to ensure that the ID of that trace is returned to the client. I believe Envoy should delegate the check for existence of the Trace ID to configured Tracing Extensions not only for request processing, but also for response processing.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 20 (10 by maintainers)

Most upvoted comments

In saying that I am not sure if this was a conscious design decision that was made during the initial days of implementation of tracing support in envoy to not explicitly add the trace context again in the response path. I have this question for the community to seek their opinion. If this is not an issue then I can work on implementation as well.

In general, it not a requirement of some (most?) tracing systems for responses to be annotated with a header. Spans are opened on the request path and then finalized at some later time. Each hop owns its owned span start/stop logic, so nothing has to be returned to the previous hop to make this happen.

I’m not sure if this can be done generically for all tracers, but it seems reasonable to me to allow a tracer to be configured to send back headers are finalizing time if it chooses. For example, if there is no response with sufficient context, local context can be added.

/assign

/assign suniltheta