runtime: Bounds checks are no longer elided when using nint for indexing.

Examine the following two methods that only differ based on using int vs nint for indexing into an array:

public static int SumWithInt32Index(int[] a)
{
    int result = 0;

    for (int i = 0; i < a.Length; i++)
    {
        result += a[i];
    }

    return result;
}

public static int SumWithNativeIntIndex(int[] a)
{
    int result = 0;
        
    for (nint i = 0; i < a.Length; i++)
    {
        result += a[i];
    }
        
    return result;
}

These two methods result in the following codegen:

; ConsoleApp64.Benchmarks.SumWithInt32Index(Int32[])
       xor       eax,eax
       xor       edx,edx
       mov       r8d,[rcx+8]
       test      r8d,r8d
       jle       short M01_L01
M01_L00:
       movsxd    r9,edx
       add       eax,[rcx+r9*4+10]
       inc       edx
       cmp       r8d,edx
       jg        short M01_L00
M01_L01:
       ret
; Total bytes of code 29

; ConsoleApp64.Benchmarks.SumWithNativeIntIndex(Int32[])
       sub       rsp,28
       xor       eax,eax
       xor       edx,edx
       mov       r8d,[rcx+8]
       movsxd    r8,r8d
       test      r8,r8
       jle       short M01_L01
M01_L00:
       cmp       rdx,r8
       jae       short M01_L02
       add       eax,[rcx+rdx*4+10]
       inc       rdx
       cmp       r8,rdx
       jg        short M01_L00
M01_L01:
       add       rsp,28
       ret
M01_L02:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3
; Total bytes of code 48

As can be seen, the latter is no longer able to elide the bounds checks due to the additional cast around a.Length to nint which results in significantly worse codegen. A developer must use Unsafe.Add to avoid the bounds checks in the inner loop and MemoryMarshal.GetArrayDataReference to avoid bounds checks in the outer loop. In both cases, while the inner loop does improve, the overall codegen is still slightly worse:

; ConsoleApp64.Benchmarks.SumWithNativeIntIndex(Int32[])
       xor       eax,eax
       lea       rdx,[rcx+10]
       xor       r8d,r8d
       mov       ecx,[rcx+8]
       movsxd    rcx,ecx
       test      rcx,rcx
       jle       short M01_L01
M01_L00:
       add       eax,[rdx+r8*4]
       inc       r8
       cmp       rcx,r8
       jg        short M01_L00
M01_L01:
       ret
; Total bytes of code 33

It would be beneficial if utilizing nint where it is accepted “just” works.

category:cq theme:bounds-checks skill-level:intermediate cost:small impact:small

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

Moving to .NET 7 since it is a feature change.

Array are always indexed by an int32

Is that right? I know the upper limit for number of elements is currently 32-bits but I believe on both x86 and ARM the actual addressing has to be done as a pointer that the CPU can process.

The ldelem, ldelema, newarr, and stelem instructions all support either int32 or native int indexes. For example, from ldelem

The ldelem.<type> instruction loads the value of the element with index index (of type int32 or native int) in the zero-based one-dimensional array array and places it on the top of the stack.

The JIT has to implicitly upcast any int index to a nint index (which is a nop on 32-bit but a sign extension on 64-bit) because most underlying hardware requires all addressing be done via native-int sized values (hence the movsxd r9, edx in the loop). So, even if the upper bounds of an array is limited to 32-bits, I think its reasonable to expect that using nint/nuint for an index will work as expected and continue having bounds elided.

Why can’t that be changed to use int instead of nint?

When dealing with Interop code, many values given from native may in fact be nint or nuint even if the actual indexes are most frequently less than 32-bits Having to downcast to 32-bits results in a truncation on 64-bit, which then must be sign extended back to nint for indexing, which I think is even harder for the JIT to reason about (and which results in bad codegen today). That leaves users who have interop code needing to use even more unsafe code, such as MemoryMarshal.GetArrayDataReference and Unsafe.Add, to ensure good codegen is produced.

The bad codegen from downcasting an nint to int:

G_M28940_IG01:
       sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M28940_IG02:
       xor      eax, eax
       xor      rdx, rdx
       mov      r8d, dword ptr [rcx+8]
       movsxd   r9, r8d
       test     r9, r9
       jle      SHORT G_M28940_IG04
						;; bbWeight=1    PerfScore 4.00
G_M28940_IG03:
       cmp      edx, r8d
       jae      SHORT G_M28940_IG05
       movsxd   r10, edx
       add      eax, dword ptr [rcx+4*r10+16]
       inc      rdx
       cmp      r9, rdx
       jg       SHORT G_M28940_IG03
						;; bbWeight=4    PerfScore 20.00
G_M28940_IG04:
       add      rsp, 40
       ret      
						;; bbWeight=1    PerfScore 1.25
G_M28940_IG05:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 52, prolog size 4, PerfScore 30.70, instruction count 18 

nint is basically long on 64-bit systems and int on 32-bit systems.

Right, it is the same number of bits as a pointer, which for refs and pointers produces better codegen. It just doesn’t play well with arrays today, likely because the JIT views it as an int32 rather than a native int that is between 0 and near Int32.MaxValue.

Is there an example in System.Private.Corelib where we use nint?

Yes, we use nint in a number of places, but mostly as a performance optimization in pointer and ref arithmetic because it avoids needing to upcast/downcast values. Since this just deals with pointers/refs, it doesn’t have the same issues that array has where it interprets the length as a 32-bit signed integer.

Prior to .NET 5, we used #ifdef to simulate having nint, but that was switched over to nint proper once the language support became available: https://github.com/dotnet/runtime/pull/36159 You can also find other PRs dealing with nint here: https://github.com/dotnet/runtime/pulls?q=is%3Apr+nint+is%3Aclosed