runtime: DiagnosticsHandler doesn't support W3C trace context correctly with aspnetcore
I have two aspnetcore applications using the W3C activity id format. When I have one call the other, the tracing context in the logs is missing a link in the chain.
Expected behavior Trace id is the same in the logs of both services. Parent id in the logs for the second service matches the span id in the logs of the first service.
Observed behavior Trace id is the same in the logs of both services. Parent id in the logs for the second service does not match the span id in the logs of the first service.
This is due to what is happening in the DiagnosticsHandler.
On line 52, the DiagnosticListener (s_diagnosticListener) is queried for whether it’s enabled. This is always false for .net core. In this statement, a new activity is started and not logged. The part of the method that we want to run (line 95-99) is never run.
I have currently hacked this to work in my project with the following code
internal static class TraceContextHack
{
private static readonly string _diagnosticsHandlerTypeFullName = "System.Net.Http.DiagnosticsHandler";
private static readonly string _diagnosticListenerFieldName = "s_diagnosticListener";
public static void Initialize()
{
var handlerType = typeof(HttpClient).Assembly.GetType(_diagnosticsHandlerTypeFullName);
var listenerField = handlerType.GetField(_diagnosticListenerFieldName, BindingFlags.NonPublic | BindingFlags.Static);
var listener = listenerField.GetValue(null) as DiagnosticListener;
if (!listener.IsEnabled())
listener.Subscribe(new NullObserver(), _ => false);
}
}
class NullObserver : IObserver<KeyValuePair<string, object>>
{
public void OnCompleted()
{
}
public void OnError(Exception error)
{
}
public void OnNext(KeyValuePair<string, object> value)
{
}
}
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 15 (7 by maintainers)
I agree that the Activity system within the .NET context makes sense that anybody anywhere at anytime can create a “sub” Activity to embody a piece of work. However, the original poster and others who have “me too” responded have a VERY valid point and I’ve spent the better part of two days tracking this down. We have services that are logging trace-id, parent-id, and span-id so that we can do correlation from logs – and when sub activities go UNLOGGED you suddenly have child spans that link to non-existent parents. That just doesn’t work.
There’s two big issues at play for me here:
First, the W3C standard expects that you have logging for every parent/child in the hierarchy. By default this logging is NOT working as the original poster has shown. Something has to be done so that we can trivially hook into and log EVERY Activity that is created in the flow of processing a request in order to prevent some-third-party-library from messing up our logging of this crucial information.
Second, the work around provided exposes a really BAD “side effect” driven behavior of the DiagnosticHandler. In order to disable the creation of the Activity inside of the DiagnosticHandler you have to meet two conditions: First, you MUST have a registered listener on
System.Net.Http.DiagnosticsHandler
; this triggers this if block to fail and not create the Activity found within. Second, this handler MUST have a predicate that returns False when asked if it handles theSystem.Net.Http.HttpRequestOut
event; this triggers this if block to fail. By having both of those if blocks fail, no sub Activity is generated.@gislikonrad bravo for figuring that out, any many thanks for your discovery!
But, I want to go back to how this is using a REALLY BAD side effect driven behavior. Let’s look at a few scenarios:
System.Net.Http.DiagnosticsHandler
results in new ActivitySystem.Net.Http.DiagnosticsHandler
that returns False for handlingSystem.Net.Http.HttpRequestOut
events results in NO new Activity.System.Net.Http.DiagnosticsHandler
that returns True for handlingSystem.Net.Http.HttpRequestOut
events results in new Activity.My primary complaint is that the first two scenarios should NOT be different – that is violation of any sense of consistency.
I appreciate the work around, but I’m concerned without a proper solution that we’re all going to suffer massive failures if A) they ever fix this violation of consistency or B) any of us add some third-party library that registers a listener that handles that event (maybe a library for doing automatic telemetry)… As soon as we add that type of library into our app boom these unlogged Activities come back.
The only way I have to work around that is to change the Null observer into one that actually handles the event and logs the new
Activity.Current.*
information. That keeps us out of the workaround that could fail under either of the above two scenarios.That’s at least my findings after two days of hunting for this problem. I don’t understand why this default behavior is acceptable in a world of W3C tracing, and I’m glad to find others hitting the same problem.