apm-agent-dotnet: Elasticsearch operations produce span retrieve error in distributed transactions

Describe the bug Hi,

I didn’t notice this with previous versions of APM, but right know if I create a manual transaction with an existing distributed tracing id with inner spans, on each Elasticsearch (Nest lib) I get the following errors from APM like Failed to find current span in ConcurrentDictionary

To Reproduce Steps to reproduce the behavior: 1.My use case is the following : App1 is publishing a message to Kafka, in the message I set a distributed correlationId using the following :

        public static string GetCorrelationLogsId() => 
                (Agent.Tracer.CurrentSpan?.OutgoingDistributedTracingData
                    ?? Agent.Tracer.CurrentTransaction?.OutgoingDistributedTracingData)?.SerializeToString();

Let’s say the string returned is 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01, it’s then set into the message before sending it to Kafka.

  1. Then App2 is consuming the message, and use the correlationId to open a new manual transaction
            await Agent.Tracer
              .CaptureTransaction("myTransaction", ApiConstants.SubtypeElasticsearch, async (t) =>
              {
                      ... some code
                      await t.CaptureSpan("PersistEntity", ApiConstants.SubtypeElasticsearch, async span =>
                      {
                          _logger.LogDebug("Persist entity in Elk with Id {id}", mymessage.Id);
                          await _elasticsearchRepository.Add(mymessage);  // produces errors !
                      });
                       ... some code
            }, DistributedTracingData.TryDeserializeFromString(mymessage.DistributedTracingData));  // 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01
  1. The call of Elastic repository method produces the following logs
[07:51:17 INF] {RequestPipelineDiagnosticsListener} Received an CallElasticsearch.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Serialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an SendAndReceiveHeaders.Start event from elasticsearch
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an ReceiveBody.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Deserialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-dc31ed64b33f0145-01
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 ERR] {RequestPipelineDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-302847dec737e94c-01

=> the correlation id reported in logs are not the same as the one used to initiate the transaction (ie : transported in message payload)

  1. The correlation Id seems to be correctly used thanks to the APM trace sample since both applications appear in the same transaction image

Versions and config Here are the current versions of used libs

    <TargetFramework>net5.0</TargetFramework>
    <PackageReference Include="Elastic.Apm.NetCoreAll" Version="1.7.1" />
    <PackageReference Include="Elastic.Apm.SerilogEnricher" Version="1.5.1" />
    <PackageReference Include="Elastic.CommonSchema.Serilog" Version="1.5.1" />
    <PackageReference Include="NEST" Version="7.10.1" />

And my appsettings.json

  "ElasticApm": {
    "ServerUrls": "http://apm:8200",
    "TransactionMaxSpans": 5000,
    "CaptureBody": "all",
    "CloudProvider": "none"
  }

Expected behavior I expect no error from APM or to understand what I did wrong

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 21 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@NicolasREY69330 - We’ve just pushed packages for 7.11.1 which include a fix for this bug.

I’ve opened https://github.com/elastic/elasticsearch-net/pull/5326 to address.

I’ve outlined the problem in the PR; it’s a rather pernicious and subtle bug that was introduced when the clients were updated to System.Diagnostics.DiagnosticSource, and stems from Activity now implementing Dispose() and setting Activity.Current to the parent Activity on Dispose(), which ends up happening when the derived Diagnostic types in the client are disposed, and before it notifies any Diagnostic Listeners of the stopping of the activity that happens in the overridden Dispose(bool).

Thank you for the repro @NicolasReyDotNet, I can reproduce the issue. Looking into

Thank you for your help

@russcam Could you reproduce the problem with my repo ?