ApplicationInsights-dotnet: Flush doesn't block until messages are actually flushed

TelemetryClient.Flush() doesn’t actually block returning until the messages are flushed. This means if you’re want to flush messages during application shutdown (like in Microsoft/ApplicationInsights-aspnetcore#304) then you have to add a Thread.Sleep(1000) which isn’t ideal.

It would be nice if there was a blocking version of this API that took an optional timeout, such that it only blocked for the time necessary to flush the events, or the timeout is hit, so that application shutdown doesn’t have to be blocked any longer than necessary.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 55
  • Comments: 46 (12 by maintainers)

Commits related to this issue

Most upvoted comments

The Flush() should be synchronous. A user can easily turn a synchronous call into an asynchronous one but not the other way around. Right now there is no way to guarantee that all messages are delivered.

The mentioning of Thread.Sleep(), even on Microsoft code samples, looks really unprofessional.

We also just ran into this - we lost some application crash exceptions. 😢

The current behavior doesn’t seem intuitive/expected to me so I’d say this should be treated as a “bug” instead of an “enhancement”.

Also, it’s the most wanted issue in this repo, so it would be great if this could get some love soon!

Stumbled upon this thread while considering adding AI to our Windows Service app. A Flush() that may return before messages are actually flushed to AI is misleading to me. In considering AI, my thoughts were:

  • I’d like to use AI; I wonder if it’ll block for network on every Track* call. That would be bad.
  • Oh, there’s a Flush() method, so it must be batching stuff. Good!
  • That means I’ll have to call Flush() myself right before the app closes.

This thread seems to say that calling Flush() probably won’t accomplish that goal. @ohadschn mentioned, “In such cases, it’s the developer’s responsibility to implement this workaround,” and my response to that is: “The developer’s responsibility would be to call Flush(). Upon successful return of Flush() they have a reasonable expectation that they’re not going to lose any AI messages.”

That’s my perspective on it, anyway. I agree with @cwe1ss that this is more of a bug than enhancement, and wanted to add a comment because the thread seems to be going the way of, “There’s maybe some value here, but it’s not really broken, and devs are responsible to use a [non-obviouse] workaround.” I hope the perspective of someone new to AI helps.

This should be the API:

Task FlushAsync(CancellationToken);

You can await it, Wait it or cancel it.

This is really important IMHO. Exceptions that are about to crash your process (and hence require flushing) are arguably the most important ones to track.

In addition, it seems like the current implementation falls between two stools - on one hand it’s synchronous so you block your thread (https://github.com/Microsoft/ApplicationInsights-dotnet/issues/482), but on the other hand delivery isn’t guaranteed so it might as well have been asynchronous (ideally returning a Task you could wait on).

Like the OP said, as things stand now you have to add some random Thread.Sleep(N) where of course there is really no way to know what N should be…

The situation is still not clarified and the docs still require Task.Delay(5000).

I up-vote this issue too. It is not intuitive for the async Flush to be named Flush instead of FlushAsync, let alone that we lack sync implementation too 😦.

Any update on this [^1]?

[^1]: 5 days before issue gets closed…

That still doesn’t guarantee that the events are sent to the ingestion endpoint. The ServerTelemetryChannel has a Transmitter with its own buffering. TelemetryBuffer.FlushAsync() merely ensures that events are handed off to the Transmitter.

@cwe1ss In AI’s defense, almost everyone will be covered by the unhandled exception module which is enabled by default in Microsoft.ApplicationInsights.WindowsServer and Microsoft.ApplicationInsights.Web (which depends on the former), by far the most prevalent AI packages (according to nuget.org statistics). Of course there are still many users of other packages, such as ASP.NET Core where this module won’t be enabled by default. And there are cases (like Service Fabric) where unhandled exceptions won’t reach the top stack frame and thus won’t be caught by the module (plus there’s no special exception handling in the AI SF package). In such cases, it’s the developer’s responsibility to implement this workaround.

As for TelemetryClient : IDisposable it makes sense for sure, though keep in mind that many developers probably have static telemetry clients (indeed, the official docs recommend “an instance of TelemetryClient for each module of your app”) where it won’t help much. Maybe AI should hook AppDomain.ProcessExit

I’ve come to this issue fresh, in September 2023, after seeing the sample code https://learn.microsoft.com/en-us/azure/azure-monitor/app/worker-service#net-corenet-framework-console-application (different URL than given in various links above). That page has sample code for a console app to use AppInsights, which includes the delay after the flush, but makes no reference to this issue. I was going to raise the issue myself (as I don’t want my console app to have to wait for an arbitrary duration), and found this issue.

From the information above, it still isn’t clear whether this is resolved or not, although https://learn.microsoft.com/en-us/azure/azure-monitor/app/telemetry-channels#built-in-telemetry-channels makes it clear that one implementation is synchronous, and the other is asynchronous.

On a conceptual point, I wonder whether this discrepancy between the implementations counts as a breach of Liskov Substitution Principle (the “L” in SOLID), because they both implement the same interface, but you can’t substitute a different implementation without breaking the correctness.

Before closing this issue please also update the getting started documentation for Application Insights to reflect that it is no longer active. That documentation points here:

There is an active issue regarding the need for Sleep which is tracked here: ApplicationInsights-dotnet/issues/407

However I find this discussion rather amusing that some folks here asking for blocking log capability. Why would one want to slow down application…

The reason is in the second sentence of the OP: you are about to terminate the process, but you don’t want to do that until the log has been fully flushed. If flush doesn’t block it is probable you will lose the last few messages. The problem is made worse by two things:

  1. Legacy logging frameworks usually blocked until flush had written all logs to persistent storage. A developer accustomed to that will think calling flush gets all logs safely written out, but in this case, they would be mistaken. At least, that was my experience.
  2. If you’re terminating a process, it may be that the last few milliseconds of logs are really important.

I hope this explanation helps you return from being taken aback. 😃 I confess to also being a little amused since the question you asked in your second paragraph seems to me to be answered in your first paragraph, but maybe I misunderstood you.

Oh, didn’t notice that. Ok, so any clarification of Flush-sleep combination usage after the new release containing this PR would be helpful.

@cijothomas @SergeyKanzhelev I wonder if something like this could be used as a workaround for the case of an exception about to crash the process?

try
{
    // something important
}
catch (Exception fatal)
{
    // retain original channel
    var serverChannel = TelemetryConfiguration.Active.TelemetryChannel;

    // switch to InMemoryChannel for guaranteed delivery of the offending exception
    TelemetryConfiguration.Active.TelemetryChannel = new InMemoryChannel();
    telemetryClient.TrackException(fatal);
    telemetryClient.Flush();

    // now that the most important part was tracked, optimistically try and drain the queue
    serverChannel.Flush();
    Thread.Sleep(1000);

    // shutdown/crash process
    throw;
}