diagnostics: dotnet-gcdump failed to collect dump

I’ve been using this to investigate https://github.com/aspnet/AspNetCore/issues/13299, and just wanted to mention that once when doing dotnet-gcdump collect, it failed with this exception:

$ dotnet-gcdump collect --process-id 1
Writing gcdump to '/app/20191028_114158_1.gcdump'...
[ERROR] System.ApplicationException: RootIndex not set.
   at Graphs.Graph.AllowReading() in /app/diagnostics/src/Tools/dotnet-gcdump/DotNetHeapDump/Graph.cs:line 236
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.<>c__DisplayClass1_0.<Collect>b__0() in /app/diagnostics/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 58
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, String output, Boolean verbose) in /app/diagnostics/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 63

Then I retried, and it worked.

Update: I have now an instance on which I’m consistently getting this exception even if I wait and try again.

_Originally posted by @markvincze in https://github.com/dotnet/diagnostics/pull/581#issuecomment-546891045_

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

this issue still exists if there is almost no Free RAM left and you are running dotnet-gsdump collect -p xxxxx

needs to be changed to use as less RAM as possible during collection

Hi @josalem,
I’m testing the latest version (d4155991 at the time I built my service), and so far that works fine, I couldn’t reproduce the exception any more.

@markvincze, I’m opening up this issue to track this failure, so that we don’t block the addition of the gcdump tool.

Could you expand on the environment where you are able to routinely able to reproduce this failure?

Info that would be helpful:

  • architecture
  • OS + version
  • base container image used
  • configuration of the tool and target, e.g., was the tool installed inside the container
  • CPU and memory usage of the target process when this failure occurs
  • is the target process deadlocked after this failure?

If possible, could you take a dump of the target process when you see this failure? If you can’t share the dump with me, please take a look in the native threads for any stacks referencing the DiagnosticsServer or EventPipe.

CC @sywhang