clrmd: Unable to analyze 4GB dumps of 32-bit processes

It seems ClrMD is not able to process 32-bit memory dumps larger than 2GB due to inconsistent address conversion between coreclr and ClrMD.

As @leculver and @mikem8361 already discussed in #280, using ulong types for address values may lead to unexpected behavior on particular platforms.

For your information, I am developing for Tizen wearable devices which run 32-bit (armel) .NET processes on 64-bit Linux kernels. Unlike normal 32-bit processes on other platforms, each user process has 4GB of address space.

As far as I can see in the DAC implementation of coreclr, most of exposed address values are in CLRDATA_ADDRESS types, so we can safely assume them as sign-extended according to this comment. When the values are passed to ClrMD however, they are (implicitly) converted into ulong type (instead of long) which seems not correct for negative CLRDATA_ADDRESS’es.

For example, when the DAC calls into ClrMD using DataTargetAdapter::ReadVirtual() and the value of address is larger than 0x7FFFFFFF (let’s say 0xFFCCBBAA), the ulong value passed to DacDataTargetWrapper appears to be super larger than expected (0xFFFFFFFFFFCCBBAA).

https://github.com/dotnet/runtime/blob/master/src/coreclr/src/debug/daccess/datatargetadapter.cpp#L186-L196

Is this conversion intentional (should I assume sign-extension for ulong)? Or am I missing something?

In my experiment, my sample code (just printing a managed stakctrace) simply worked by modifying the implementation of TO_CDADDR macro in the coreclr runtime. I don’t think this would be a right choice however. Changing all occurrences of ulong in the DAC interfaces of ClrMD to long also looks bad since we will need a lot of work to be done and additional maths in the code.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (23 by maintainers)

Most upvoted comments

@swift-kim: Can you test whether PR #535 fixes the issue or not? I have no way to test whether my changes were correct or if I missed something.

Sorry for the delay. I am taking a look at this now and I will likely have a pull request ready (for 2.0) by tonight or tomorrow for review.

I’d like to approach this in a more methodical way to fix it for the entire library and not try to spot-fix the specific locations that make it work for this specific problem. I won’t be able to test this fully on arm64 this week, but hopefully you can try it out and let me know if it works.

devices which run 32-bit (armel) on 64-bit Linux kernels

accounting for sign extension on x86

You might be talking about different platforms (in case it makes a difference for sign extension expectations)