runtime: ActivityId missing in a newly added HttpTelemetry Stop events.

In non-trivial number of cases, the RequestStop event has empty activity id.

9 945 out of 35 452 requests sent during HTTP FunctionalTests were missing activity id.

Other Start/Stop event pairs introduced with the new telemetry are not properly linked with ActivityId as well.

Also easily reproducible with a simple HttpClient scenario:

class Program
{
    static async Task Main(string[] args)
    {
        using var listener = new HttpEventListener();
        using var client = new HttpClient();
        
        await client.GetAsync(@"https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Http/src/HttpDiagnosticsGuide.md");
    }
}

internal sealed class HttpEventListener : EventListener
{
    public const EventKeywords TasksFlowActivityIds = (EventKeywords)0x80;
    //public const EventKeywords Debug = (EventKeywords)0x20000;

    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        if (eventSource.Name == "System.Net.Http" ||
            eventSource.Name == "System.Net.Sockets" ||
            eventSource.Name == "System.Net.Security" ||
            eventSource.Name == "System.Net.NameResolution")
        {
            EnableEvents(eventSource, EventLevel.LogAlways);
        }
        else if (eventSource.Name == "System.Threading.Tasks.TplEventSource")
        {
            EnableEvents(eventSource, EventLevel.LogAlways, TasksFlowActivityIds /*| Debug*/);
        }
    }

    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        var sb = new StringBuilder().Append($"{eventData.TimeStamp:HH:mm:ss.fffffff}[{eventData.EventName}]{eventData.ActivityId}.{eventData.RelatedActivityId} ");
        for (int i = 0; i < eventData.Payload?.Count; i++)
        {
            if (i > 0)
                sb.Append(", ");
            sb.Append(eventData.PayloadNames?[i]).Append(": ").Append(eventData.Payload[i]);
        }
        Console.WriteLine(sb.ToString());
    }
}

Result (with the change in https://github.com/dotnet/runtime/pull/41022)

09:34:45.3129327[RequestStart]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000 scheme: https, host: github.com, port: 443, pathAndQuery: /dotnet/runtime/blob/master/src/libraries/System.Net.Http/src/HttpDiagnosticsGuide.md, versionMajor: 1, versionMinor: 1
09:34:45.4344957[ResolutionStart]00001011-0000-0000-0000-0000090f9d59.00000011-0000-0000-0000-0000093f9d59 hostNameOrAddress: github.com
09:34:45.4752982[ResolutionStop]00001011-0000-0000-0000-0000090f9d59.00000000-0000-0000-0000-000000000000 
09:34:45.4848747[ConnectStart]00002011-0000-0000-0000-0000091f9d59.00000011-0000-0000-0000-0000093f9d59 address: InterNetworkV6:28:{1,187,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,140,82,118,3,0,0,0,0}
09:34:45.5208470[ConnectStop]00002011-0000-0000-0000-0000091f9d59.00000000-0000-0000-0000-000000000000 
09:34:45.5347626[HandshakeStart]00003011-0000-0000-0000-0000096f9d59.00000011-0000-0000-0000-0000093f9d59 isServer: False, targetHost: github.com
09:34:45.8030641[HandshakeStop]00003011-0000-0000-0000-0000096f9d59.00000000-0000-0000-0000-000000000000 protocol: Tls13
09:34:45.8070942[Http11ConnectionEstablished]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000 
09:34:46.2040073[ResponseHeadersBegin]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000 
09:34:46.3278444[RequestStop]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000

~Un-paired Stop events:~ ~- ResolutionStop: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs#L485~ ~- ConnectStop: ran in SocketsEventArgs: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.cs#L744~ ~- HandshakeStop: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs#L240~ They’re paired, I misinterpreted the logs. The Stop event doesn’t carry parent activity id, only the start one.

What needs to be done to achieve desired statistics:

  • Fix Request events #41022
  • ~Fix Resolution events~ wrongly interpreted logs
  • ~Fix Connect events~ wrongly interpreted logs
  • ~Fix Handshake events~ wrongly interpreted logs
  • Add events (Start/Stop) for request headers sent
  • Add events (Start/Stop) for request body sent
  • Add events (Start/Stop) for response headers received
  • Add events (Start/Stop) for response body received when we buffer it #41022
  • Add events (Start/Stop) to convenience methods using ResponseHeadersRead for the whole method #41022
  • Add events (Start/Stop) to convenience methods for the response stream reading (the same pair as in for response body received) #41022
  • Document how to add custom Start/Stop events for users using ResponseHeadersRead that want to properly time response body reading (based on our solution for convenience methods)

We can also add or leave existing unpaired telemetry (though it shouldn’t be named Start/Stop then) for events that cannot be bound by a single async context. We can link them to the Start/Stop events manually either via remembering ActivityId or using some other custom ID.

cc: @samsp-msft @MihaZupan @karelz

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 19 (15 by maintainers)

Most upvoted comments

Fixed in 6.0/master (PR #41022) and in 5.0 (PR #41610).

For 5, I’d recommend we scope down and send the stop at the end of SendAsync, and make it clear either with the event name, or parameters in the payload as to what type of request it is, and so what the Stop will actually correspond to.

One could save EventSource.CurrentThreadActivityId after the start event …

What @KalleOlaviNiemitalo suggests would work to set the activity id of the stop event, however there is also the consideration of what activity id is attached to all events that happen to occur in between the start and stop. Perhaps we are in position to know that we can control all of those events because we own 100% of the code that executes between those two points? I didn’t get a chance to read all the updates above and it is almost 5:30am so I need to sleep before I’ll be able to do a good job thinking about this. When I’m awake again I’ll dig into it further : )

for read-headers it wouldn’t.

The idea is that it would be captured in the user-provided callback.

Essentially if a user had code like

Task DoSomethingWithResponseAsync(HttpResponseMessage response) {}
var response = await client.SendAsync(request, completionOption);
await DoSomethingWithResponseAsync(response);

it would turn into

await client.SendAndProcessWithTelemetryAsync(request, completionOption, DoSomethingWithResponseAsync);

That way the Telemetry Start/Stop would be in the enclosing scope of the response processing.

@ManickaP nice investigating 👍 Indeed as you observed, ActivityId is async-local so it flows from parent to child when creating a Task but changes do not flow back from child to parent when an await completes.

I’m not knowledgeable about the networking code, but is there an option to call Start() sooner so that it runs in the same Task where Stop() will eventually be called? Other more customized solutions may also be possible, but I’ve never attempted to author one and my suspicion is it would get messy.