clrmd: Unable to analyze 4GB dumps of 32-bit processes

It seems ClrMD is not able to process 32-bit memory dumps larger than 2GB due to inconsistent address conversion between coreclr and ClrMD.

As @leculver and @mikem8361 already discussed in #280, using ulong types for address values may lead to unexpected behavior on particular platforms.

For your information, I am developing for Tizen wearable devices which run 32-bit (armel) .NET processes on 64-bit Linux kernels. Unlike normal 32-bit processes on other platforms, each user process has 4GB of address space.

As far as I can see in the DAC implementation of coreclr, most of exposed address values are in CLRDATA_ADDRESS types, so we can safely assume them as sign-extended according to this comment. When the values are passed to ClrMD however, they are (implicitly) converted into ulong type (instead of long) which seems not correct for negative CLRDATA_ADDRESS’es.

For example, when the DAC calls into ClrMD using DataTargetAdapter::ReadVirtual() and the value of address is larger than 0x7FFFFFFF (let’s say 0xFFCCBBAA), the ulong value passed to DacDataTargetWrapper appears to be super larger than expected (0xFFFFFFFFFFCCBBAA).

https://github.com/dotnet/runtime/blob/master/src/coreclr/src/debug/daccess/datatargetadapter.cpp#L186-L196

Is this conversion intentional (should I assume sign-extension for ulong)? Or am I missing something?

In my experiment, my sample code (just printing a managed stakctrace) simply worked by modifying the implementation of TO_CDADDR macro in the coreclr runtime. I don’t think this would be a right choice however. Changing all occurrences of ulong in the DAC interfaces of ClrMD to long also looks bad since we will need a lot of work to be done and additional maths in the code.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 23 (23 by maintainers)

Most upvoted comments

@swift-kim: Can you test whether PR #535 fixes the issue or not? I have no way to test whether my changes were correct or if I missed something.

leculver on Jan 31, 2020

Sorry for the delay. I am taking a look at this now and I will likely have a pull request ready (for 2.0) by tonight or tomorrow for review.

I’d like to approach this in a more methodical way to fix it for the entire library and not try to spot-fix the specific locations that make it work for this specific problem. I won’t be able to test this fully on arm64 this week, but hopefully you can try it out and let me know if it works.

leculver on Dec 12, 2019

devices which run 32-bit (armel) on 64-bit Linux kernels

accounting for sign extension on x86

You might be talking about different platforms (in case it makes a difference for sign extension expectations)

weltkante on Dec 10, 2019