envoy: Missing "x-amzn-trace-id" (and other trace ids) in response headers
Title: Missing “x-amzn-trace-id” (and other trace ids) in response headers
Description:
This was originally reported against AWS AppMesh / Envoy for X-Ray integration (https://github.com/aws/aws-app-mesh-roadmap/issues/394), but further evaluation of the issue by @suniltheta showed that it’s an issue with any tracing extension, and results from how Envoy support for tracing extensions is built internally. Long story short: Envoy doesn’t add tracing headers to response if they’re missing. This way, proper serving on trace headers fully relies on whether the application container uses tracing SDK (e.g. XRAY SDK) or is otherwise configured to propagate tracing headers from Request to Response.
Repro steps:
- Deploy an ECS Fargate Service with Envoy, X-Ray Sidecar and raw, unconfigured
nginxcontainer acting as the “application container”.- Configure Envoy to send Traces to X-Ray
- Send a bunch of requests to this service using
curl -v ...- Observe the following: 4.1. Traces of the requests are visible in X-Ray 4.2. Nginx sees the
x-amzn-trace-idheader, but is not configured to send it back (also, see “Why i think…” below) 4.3. The response - as seen bycurl- doesn’t show thex-amzn-trace-idheader.In https://github.com/aws/aws-app-mesh-roadmap/issues/394 I present in much more details the configurations that were tested using ECS and AppMesh. @suniltheta also describes how he tested this with X-Ray and Jaeger
Why I think this is a bug?
In ideal scenario, where our “application container” is properly configured with XRAY SDK, or performs otherwise configured request-to-response header rewrite, all is fine. But in that ideal scenario, we rarely ever need the trace id. Things get ugly, when the requests start failing. Consider following scenarios:
- The “application service” times out and envoy responds with 504.
- The “application service” fails with some internal error and responds with a
5xxresponse code, but does not provide trace-id in the response.- The “application service” contains a misconfigured XRAY SDK, and the tracing headers are not copied to response
- The “application service” is a 3rd party, closed-source app, which has no way of being configured to copy these headers.
In any of these cases the client is unable to provide us with any meaning trace-id to debug issues with the request. I believe it should be Envoy’s responsibility to - once it initiated new trace on the incoming request - to ensure that the ID of that trace is returned to the client. I believe Envoy should delegate the check for existence of the Trace ID to configured Tracing Extensions not only for request processing, but also for response processing.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 20 (10 by maintainers)
In general, it not a requirement of some (most?) tracing systems for responses to be annotated with a header. Spans are opened on the request path and then finalized at some later time. Each hop owns its owned span start/stop logic, so nothing has to be returned to the previous hop to make this happen.
I’m not sure if this can be done generically for all tracers, but it seems reasonable to me to allow a tracer to be configured to send back headers are finalizing time if it chooses. For example, if there is no response with sufficient context, local context can be added.
/assign
/assign suniltheta