runtime: Vector128:get_Zero() doesn't inline (or intrinsicify) at crossgen

Nor does Vector128<byte>.Count() or Vector128:AsByte(Vector128`1):Vector128`1

ASCIIUtility.WidenAsciiToUtf16_Sse2 calls Vector128<byte>.Zero

https://github.com/dotnet/runtime/blob/fd181c0210c263e40ce8b908bdd521d8f3cc284e/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L1633-L1637

Which ends up reserving stack, making call and reading the stack back to zero a xmm register:

G_M55642_IG05:
       lea      rcx, [rsp+20H]
       call     [Vector128`1:get_Zero():Vector128`1]
       movaps   xmm0, xmmword ptr [rsp+20H]
       movaps   xmm1, xmm6
       punpcklbw xmm1, xmm0
       movdqu   xmmword ptr [rsi], xmm1
       mov      rax, rsi
       shr      rax, 1
       and      rax, 7
       mov      edx, 8
       sub      rdx, rax
       mov      rax, rdx
       sub      rbx, 16

Which is quite inefficient

/cc @tannergooding @GrabYourPitchforks

category:cq theme:intrinsics skill-level:expert cost:medium

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 32 (32 by maintainers)

Most upvoted comments

That seems reasonable to do, although we need to take care to put documentation in the code about why this is ok, and notes in the codegen code about this.

davidwrighton on Aug 11, 2020

Here is what I found so far.

Allowing Vector128.As, Vector128.AsByte - Vector128.AsUInt64 to be unconditionally expanded when compiling S.P.C.dll leads to these results

Found 1 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -7601 (-0.199% of base)
    diff is an improvement.

Top file improvements (bytes):
       -7601 : System.Private.CoreLib.dasm (-0.199% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -429 (-34.513% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
        -429 (-34.347% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:NarrowUtf16ToLatin1_Sse2(long,long,long):long
        -372 (-10.632% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf8(long,int,long,int,byref,byref):int
        -283 (-7.453% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf16(long,int,long,int,byref,byref):int
        -250 (-18.671% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:GetIndexOfFirstNonLatin1Char_Sse2(long,long):long
        -244 (-18.141% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
        -173 (-14.686% of base) : System.Private.CoreLib.dasm - System.Number:FormatDouble(byref,double,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -173 (-14.965% of base) : System.Private.CoreLib.dasm - System.Number:FormatSingle(byref,float,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -140 (-12.302% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunDouble(double,int,byref):bool
        -140 (-12.891% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunSingle(float,int,byref):bool
        -126 (-13.801% of base) : System.Private.CoreLib.dasm - System.Number:NumberToFloatingPointBits(byref,byref):long
        -109 (-26.456% of base) : System.Private.CoreLib.dasm - System.Half:op_Explicit(System.Half):float
        -108 (-2.032% of base) : System.Private.CoreLib.dasm - System.Diagnostics.Tracing.EventPipePayloadDecoder:DecodePayload(byref,System.ReadOnlySpan`1[Byte]):System.Object[]
        -103 (-20.276% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:WidenLatin1ToUtf16_Sse2(long,long,long)
        -102 (-43.966% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
        -102 (-47.442% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
         -94 (-7.198% of base) : System.Private.CoreLib.dasm - System.Variant:MarshalHelperConvertObjectToVariant(System.Object,byref)
         -91 (-3.669% of base) : System.Private.CoreLib.dasm - System.Variant:MarshalHelperCastVariant(System.Object,int,byref)
         -85 (-15.071% of base) : System.Private.CoreLib.dasm - System.Variant:ToObject():System.Object:this
         -79 (-9.371% of base) : System.Private.CoreLib.dasm - System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():bool

Top method improvements (percentages):
         -58 (-71.605% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:Equals(System.Runtime.Intrinsics.Vector128`1[Single]):bool:this
         -58 (-69.880% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:Equals(System.Runtime.Intrinsics.Vector128`1[Double]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:Equals(System.Runtime.Intrinsics.Vector128`1[Int64]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:Equals(System.Runtime.Intrinsics.Vector128`1[Int32]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:Equals(System.Runtime.Intrinsics.Vector128`1[Int16]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:Equals(System.Runtime.Intrinsics.Vector128`1[Byte]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt16][System.UInt16]:Equals(System.Runtime.Intrinsics.Vector128`1[UInt16]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt64][System.UInt64]:Equals(System.Runtime.Intrinsics.Vector128`1[UInt64]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt32][System.UInt32]:Equals(System.Runtime.Intrinsics.Vector128`1[UInt32]):bool:this
         -58 (-69.048% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[SByte][System.SByte]:Equals(System.Runtime.Intrinsics.Vector128`1[SByte]):bool:this
         -29 (-67.442% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
         -29 (-67.442% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(ushort):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -29 (-61.702% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(short):System.Runtime.Intrinsics.Vector128`1[Int16]
         -29 (-61.702% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(byte):System.Runtime.Intrinsics.Vector128`1[SByte]
        -102 (-47.442% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
         -26 (-44.828% of base) : System.Private.CoreLib.dasm - System.BitConverter:SingleToInt32Bits(float):int
         -26 (-44.068% of base) : System.Private.CoreLib.dasm - System.BitConverter:DoubleToInt64Bits(double):long
        -102 (-43.966% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
         -67 (-41.875% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:Equals(System.Object):bool:this
         -67 (-41.358% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:Equals(System.Object):bool:this

121 total methods with Code Size differences (121 improved, 0 regressed), 28561 unchanged.

If, in addition to As* methods, also treat the following as intrinsics - Vector128.Create, Vector128.CreateScalarUnsafe, Vector128.ToScalar:

Found 1 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -12573 (-0.329% of base)
    diff is an improvement.

Top file improvements (bytes):
      -12573 : System.Private.CoreLib.dasm (-0.329% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -410 (-32.985% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
        -410 (-32.826% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:NarrowUtf16ToLatin1_Sse2(long,long,long):long
        -376 (-10.746% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf8(long,int,long,int,byref,byref):int
        -352 (-29.881% of base) : System.Private.CoreLib.dasm - System.Number:FormatDouble(byref,double,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -349 (-30.190% of base) : System.Private.CoreLib.dasm - System.Number:FormatSingle(byref,float,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -344 (-6.472% of base) : System.Private.CoreLib.dasm - System.Diagnostics.Tracing.EventPipePayloadDecoder:DecodePayload(byref,System.ReadOnlySpan`1[Byte]):System.Object[]
        -329 (-28.910% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunDouble(double,int,byref):bool
        -295 (-27.164% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunSingle(float,int,byref):bool
        -283 (-7.453% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf16(long,int,long,int,byref,byref):int
        -279 (-46.656% of base) : System.Private.CoreLib.dasm - System.Math:Round(double):double
        -262 (-51.779% of base) : System.Private.CoreLib.dasm - System.MathF:Round(float):float
        -259 (-19.343% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:GetIndexOfFirstNonLatin1Char_Sse2(long,long):long
        -253 (-18.810% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
        -205 (-49.757% of base) : System.Private.CoreLib.dasm - System.Half:op_Explicit(System.Half):float
        -189 (-87.907% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
        -187 (-80.603% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
        -177 (-19.387% of base) : System.Private.CoreLib.dasm - System.Number:NumberToFloatingPointBits(byref,byref):long
        -152 (-65.801% of base) : System.Private.CoreLib.dasm - System.MathF:BitIncrement(float):float
        -151 (-66.520% of base) : System.Private.CoreLib.dasm - System.MathF:BitDecrement(float):float
        -149 (-60.081% of base) : System.Private.CoreLib.dasm - System.Math:BitDecrement(double):double

Top method improvements (percentages):
         -65 (-92.857% of base) : System.Private.CoreLib.dasm - System.BitConverter:Int32BitsToSingle(int):float
         -65 (-91.549% of base) : System.Private.CoreLib.dasm - System.BitConverter:Int64BitsToDouble(long):double
         -53 (-91.379% of base) : System.Private.CoreLib.dasm - System.BitConverter:SingleToInt32Bits(float):int
         -53 (-89.831% of base) : System.Private.CoreLib.dasm - System.BitConverter:DoubleToInt64Bits(double):long
        -189 (-87.907% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
         -94 (-81.739% of base) : System.Private.CoreLib.dasm - System.MathF:CopySign(float,float):float
        -187 (-80.603% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
         -53 (-80.303% of base) : System.Private.CoreLib.dasm - System.Single:IsNegative(float):bool
         -53 (-77.941% of base) : System.Private.CoreLib.dasm - System.Double:IsNegative(double):bool
        -115 (-75.163% of base) : System.Private.CoreLib.dasm - System.Math:Min(float,float):float
         -57 (-75.000% of base) : System.Private.CoreLib.dasm - System.BitConverter:ToSingle(System.Byte[],int):float
         -57 (-75.000% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:ReadSingle(long):float:this
         -61 (-74.390% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:Write(long,float):this
         -61 (-74.390% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:Write(long,double):this
         -57 (-74.026% of base) : System.Private.CoreLib.dasm - System.BitConverter:ToDouble(System.Byte[],int):double
         -57 (-74.026% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:ReadDouble(long):double:this
        -115 (-72.785% of base) : System.Private.CoreLib.dasm - System.Math:Min(double,double):double
         -53 (-71.622% of base) : System.Private.CoreLib.dasm - System.Single:IsFinite(float):bool
         -53 (-71.622% of base) : System.Private.CoreLib.dasm - System.Single:IsInfinity(float):bool
         -58 (-71.605% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:Equals(System.Runtime.Intrinsics.Vector128`1[Single]):bool:this

128 total methods with Code Size differences (128 improved, 0 regressed), 28554 unchanged.

Do this for Vector128<T>.Count and Vector256<T>.Count only

Found 1 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -658 (-0.017% of base)
    diff is an improvement.

Top file improvements (bytes):
        -658 : System.Private.CoreLib.dasm (-0.017% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method regressions (bytes):
         178 (136.923% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt32][System.UInt32]:GetHashCode():int:this
          76 (46.914% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt64][System.UInt64]:GetHashCode():int:this

Top method improvements (bytes):
         -57 (-61.957% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:UnalignedCountVector128(byref):long (2 methods)
         -32 (-4.097% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:ToString():System.String:this
         -31 (-3.861% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[SByte][System.SByte]:ToString():System.String:this
         -31 (-3.914% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int32][System.Int32]:ToString():System.String:this
         -31 (-3.890% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int64][System.Int64]:ToString():System.String:this
         -31 (-3.861% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int16][System.Int16]:ToString():System.String:this
         -31 (-3.995% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:ToString():System.String:this
         -31 (-3.939% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:ToString():System.String:this
         -31 (-3.939% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[SByte][System.SByte]:ToString():System.String:this
         -28 (-11.475% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[__Canon][System.__Canon]:<Equals>g__SoftwareFallback|12_0(byref,System.Runtime.Intrinsics.Vector128`1[__Canon]):bool
         -27 (-71.053% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetCharVector128SpanLength(long,long):long
         -27 (-71.053% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetCharVector256SpanLength(long,long):long
         -25 (-9.542% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[__Canon][System.__Canon]:<Equals>g__SoftwareFallback|14_0(byref,System.Runtime.Intrinsics.Vector256`1[__Canon]):bool
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector128SpanLength(long,int):long
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector256SpanLength(long,int):long
         -24 (-3.230% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[__Canon][System.__Canon]:ToString():System.String:this
         -23 (-3.030% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[__Canon][System.__Canon]:ToString():System.String:this
         -22 (-12.500% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[__Canon][System.__Canon]:GetHashCode():int:this
         -22 (-11.892% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[__Canon][System.__Canon]:GetHashCode():int:this
         -22 (-3.121% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:ToString():System.String:this

Top method regressions (percentages):
         178 (136.923% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt32][System.UInt32]:GetHashCode():int:this
          76 (46.914% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt64][System.UInt64]:GetHashCode():int:this

Top method improvements (percentages):
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector128SpanLength(long,int):long
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector256SpanLength(long,int):long
         -27 (-71.053% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetCharVector128SpanLength(long,long):long
         -27 (-71.053% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetCharVector256SpanLength(long,long):long
         -57 (-61.957% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:UnalignedCountVector128(byref):long (2 methods)
         -15 (-15.000% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:GetHashCode():int:this
         -15 (-15.000% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt16][System.UInt16]:GetHashCode():int:this
         -15 (-14.851% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:GetHashCode():int:this
         -15 (-14.851% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[SByte][System.SByte]:GetHashCode():int:this
         -15 (-13.889% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[UInt32][System.UInt32]:GetHashCode():int:this
         -15 (-13.889% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int32][System.Int32]:GetHashCode():int:this
         -15 (-13.761% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[UInt16][System.UInt16]:GetHashCode():int:this
         -15 (-13.761% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Byte][System.Byte]:GetHashCode():int:this
         -15 (-13.636% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[SByte][System.SByte]:GetHashCode():int:this
         -15 (-13.636% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int16][System.Int16]:GetHashCode():int:this
         -15 (-13.043% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[UInt64][System.UInt64]:GetHashCode():int:this
         -15 (-13.043% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[Int64][System.Int64]:GetHashCode():int:this
         -22 (-12.500% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[__Canon][System.__Canon]:GetHashCode():int:this
         -22 (-11.892% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[__Canon][System.__Canon]:GetHashCode():int:this
         -15 (-11.628% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[UInt32][System.UInt32]:<Equals>g__SoftwareFallback|14_0(byref,System.Runtime.Intrinsics.Vector256`1[UInt32]):bool

61 total methods with Code Size differences (55 improved, 6 regressed), 28621 unchanged.

Note that the code size increase in GetHashCode() methods is due to loop unrolling kicking in when compiling the methods since the loop boundary becomes constant.

Only Vector128<T>.Zero

Found 1 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -801 (-0.021% of base)
    diff is an improvement.

Top file improvements (bytes):
        -801 : System.Private.CoreLib.dasm (-0.021% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -76 (-2.002% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf16(long,int,long,int,byref,byref):int
         -37 (-27.007% of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4:op_UnaryNegation(System.Numerics.Matrix4x4):System.Numerics.Matrix4x4
         -25 (-36.765% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(double):System.Runtime.Intrinsics.Vector128`1[Double]
         -25 (-36.765% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(float):System.Runtime.Intrinsics.Vector128`1[Single]
         -23 (-4.528% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:WidenLatin1ToUtf16_Sse2(long,long,long)
         -23 (-42.593% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<CreateScalar>g__SoftwareFallback|51_0(double):System.Runtime.Intrinsics.Vector128`1[Double]
         -23 (-42.593% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<CreateScalar>g__SoftwareFallback|56_0(float):System.Runtime.Intrinsics.Vector128`1[Single]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector128`1[Int64]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|40_0(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|41_0(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|42_0(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]

Top method improvements (percentages):
         -23 (-42.593% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<CreateScalar>g__SoftwareFallback|51_0(double):System.Runtime.Intrinsics.Vector128`1[Double]
         -23 (-42.593% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<CreateScalar>g__SoftwareFallback|56_0(float):System.Runtime.Intrinsics.Vector128`1[Single]
         -25 (-36.765% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(double):System.Runtime.Intrinsics.Vector128`1[Double]
         -25 (-36.765% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(float):System.Runtime.Intrinsics.Vector128`1[Single]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector128`1[Int64]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:Create(System.Runtime.Intrinsics.Vector64`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|40_0(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|41_0(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|42_0(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|43_0(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|44_0(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector128`1[Int64]
         -21 (-36.207% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|45_0(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]

37 total methods with Code Size differences (37 improved, 0 regressed), 28645 unchanged.

All of these together

Found 1 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -14074 (-0.368% of base)
    diff is an improvement.

Top file improvements (bytes):
      -14074 : System.Private.CoreLib.dasm (-0.368% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method regressions (bytes):
         178 (136.923% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt32][System.UInt32]:GetHashCode():int:this
          76 (46.914% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt64][System.UInt64]:GetHashCode():int:this

Top method improvements (bytes):
        -410 (-32.985% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
        -410 (-32.826% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:NarrowUtf16ToLatin1_Sse2(long,long,long):long
        -376 (-10.746% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf8(long,int,long,int,byref,byref):int
        -365 (-9.613% of base) : System.Private.CoreLib.dasm - System.Text.Unicode.Utf8Utility:TranscodeToUtf16(long,int,long,int,byref,byref):int
        -352 (-29.881% of base) : System.Private.CoreLib.dasm - System.Number:FormatDouble(byref,double,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -349 (-30.190% of base) : System.Private.CoreLib.dasm - System.Number:FormatSingle(byref,float,System.ReadOnlySpan`1[Char],System.Globalization.NumberFormatInfo):System.String
        -344 (-6.472% of base) : System.Private.CoreLib.dasm - System.Diagnostics.Tracing.EventPipePayloadDecoder:DecodePayload(byref,System.ReadOnlySpan`1[Byte]):System.Object[]
        -329 (-28.910% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunDouble(double,int,byref):bool
        -295 (-27.164% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunSingle(float,int,byref):bool
        -279 (-46.656% of base) : System.Private.CoreLib.dasm - System.Math:Round(double):double
        -262 (-51.779% of base) : System.Private.CoreLib.dasm - System.MathF:Round(float):float
        -259 (-19.343% of base) : System.Private.CoreLib.dasm - System.Text.Latin1Utility:GetIndexOfFirstNonLatin1Char_Sse2(long,long):long
        -253 (-18.810% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
        -205 (-49.757% of base) : System.Private.CoreLib.dasm - System.Half:op_Explicit(System.Half):float
        -189 (-87.907% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
        -187 (-80.603% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
        -177 (-19.387% of base) : System.Private.CoreLib.dasm - System.Number:NumberToFloatingPointBits(byref,byref):long
        -152 (-65.801% of base) : System.Private.CoreLib.dasm - System.MathF:BitIncrement(float):float
        -151 (-66.520% of base) : System.Private.CoreLib.dasm - System.MathF:BitDecrement(float):float
        -149 (-60.081% of base) : System.Private.CoreLib.dasm - System.Math:BitDecrement(double):double

Top method regressions (percentages):
         178 (136.923% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Single][System.Single]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:GetHashCode():int:this
          85 (85.859% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt32][System.UInt32]:GetHashCode():int:this
          76 (46.914% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:GetHashCode():int:this
          18 (16.981% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[UInt64][System.UInt64]:GetHashCode():int:this

Top method improvements (percentages):
         -65 (-92.857% of base) : System.Private.CoreLib.dasm - System.BitConverter:Int32BitsToSingle(int):float
         -65 (-91.549% of base) : System.Private.CoreLib.dasm - System.BitConverter:Int64BitsToDouble(long):double
         -53 (-91.379% of base) : System.Private.CoreLib.dasm - System.BitConverter:SingleToInt32Bits(float):int
         -53 (-89.831% of base) : System.Private.CoreLib.dasm - System.BitConverter:DoubleToInt64Bits(double):long
        -189 (-87.907% of base) : System.Private.CoreLib.dasm - System.MathF:<CopySign>g__SoftwareFallback|36_0(float,float):float
         -94 (-81.739% of base) : System.Private.CoreLib.dasm - System.MathF:CopySign(float,float):float
        -187 (-80.603% of base) : System.Private.CoreLib.dasm - System.Math:<CopySign>g__SoftwareFallback|46_0(double,double):double
         -53 (-80.303% of base) : System.Private.CoreLib.dasm - System.Single:IsNegative(float):bool
         -54 (-79.412% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(double):System.Runtime.Intrinsics.Vector128`1[Double]
         -54 (-79.412% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:CreateScalar(float):System.Runtime.Intrinsics.Vector128`1[Single]
         -53 (-77.941% of base) : System.Private.CoreLib.dasm - System.Double:IsNegative(double):bool
        -115 (-75.163% of base) : System.Private.CoreLib.dasm - System.Math:Min(float,float):float
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector128SpanLength(long,int):long
         -24 (-75.000% of base) : System.Private.CoreLib.dasm - System.SpanHelpers:GetByteVector256SpanLength(long,int):long
         -57 (-75.000% of base) : System.Private.CoreLib.dasm - System.BitConverter:ToSingle(System.Byte[],int):float
         -57 (-75.000% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:ReadSingle(long):float:this
         -61 (-74.390% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:Write(long,float):this
         -61 (-74.390% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:Write(long,double):this
         -57 (-74.026% of base) : System.Private.CoreLib.dasm - System.BitConverter:ToDouble(System.Byte[],int):double
         -57 (-74.026% of base) : System.Private.CoreLib.dasm - System.IO.UnmanagedMemoryAccessor:ReadDouble(long):double:this

221 total methods with Code Size differences (215 improved, 6 regressed), 28461 unchanged.

As an example, the following is what such change would do for pre-jitted code of System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long

@@ -8,75 +8,50 @@
 ;
 ;  V00 arg0         [V00,T02] (  6, 11.50)    long  ->  rsi        
 ;  V01 arg1         [V01,T04] (  8,  8.50)    long  ->  rdi        
-;  V02 arg2         [V02,T09] (  6,  4.50)    long  ->  rbx        
-;* V03 loc0         [V03,T35] (  0,  0   )     int  ->  zero-ref   
-;* V04 loc1         [V04,T36] (  0,  0   )    long  ->  zero-ref   
-;  V05 loc2         [V05    ] (  5,  4.50)  simd16  ->  [rsp+0x110]   do-not-enreg[XS] addr-exposed
-;  V06 loc3         [V06    ] (  5,  4.50)  simd16  ->  [rsp+0x100]   do-not-enreg[XS] addr-exposed
-;  V07 loc4         [V07,T37] ( 17, 19.50)  simd16  ->  mm6        
-;  V08 loc5         [V08,T39] (  8, 11   )  simd16  ->  mm7        
+;  V02 arg2         [V02,T07] (  6,  4.50)    long  ->  rbx        
+;* V03 loc0         [V03,T24] (  0,  0   )     int  ->  zero-ref   
+;* V04 loc1         [V04,T25] (  0,  0   )    long  ->  zero-ref   
+;  V05 loc2         [V05,T30] (  5,  4.50)  simd16  ->  mm6        
+;  V06 loc3         [V06,T31] (  5,  4.50)  simd16  ->  mm7        
+;  V07 loc4         [V07,T26] ( 17, 19.50)  simd16  ->  mm8        
+;  V08 loc5         [V08,T28] (  8, 11   )  simd16  ->  mm9        
 ;  V09 loc6         [V09,T00] ( 16, 29   )    long  ->  r14        
-;  V10 loc7         [V10,T11] (  2,  4.50)    long  ->  rbx        
-;  V11 loc8         [V11,T38] (  3, 12   )  simd16  ->  mm7        
-;  V12 loc9         [V12,T40] (  3,  8   )  simd16  ->  mm8        
+;  V10 loc7         [V10,T09] (  2,  4.50)    long  ->  rbx        
+;  V11 loc8         [V11,T27] (  3, 12   )  simd16  ->  mm9        
+;  V12 loc9         [V12,T29] (  3,  8   )  simd16  ->  mm10        
 ;  V13 OutArgs      [V13    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
-;  V14 tmp1         [V14    ] (  2,  2   )  simd16  ->  [rsp+0xF0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V15 tmp2         [V15    ] (  2,  2   )  simd16  ->  [rsp+0xE0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V16 tmp3         [V16    ] (  2,  2   )  simd16  ->  [rsp+0xD0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V17 tmp4         [V17,T30] (  3,  1.50)     int  ->  rax        
-;  V18 tmp5         [V18    ] (  2,  8   )  simd16  ->  [rsp+0xC0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V19 tmp6         [V19    ] (  2,  8   )  simd16  ->  [rsp+0xB0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V20 tmp7         [V20    ] (  2,  2   )  simd16  ->  [rsp+0xA0]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V21 tmp8         [V21    ] (  2,  2   )  simd16  ->  [rsp+0x90]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V22 tmp9         [V22    ] (  2,  2   )  simd16  ->  [rsp+0x80]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V23 tmp10        [V23    ] (  2,  2   )  simd16  ->  [rsp+0x70]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V24 tmp11        [V24    ] (  2,  2   )  simd16  ->  [rsp+0x60]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
-;  V25 tmp12        [V25    ] (  2,  2   )  simd16  ->  [rsp+0x50]   do-not-enreg[XS] addr-exposed "struct address for call/obj"
+;  V14 tmp1         [V14,T19] (  3,  1.50)     int  ->  rax        
+;* V15 tmp2         [V15    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;* V16 tmp3         [V16    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;  V17 tmp4         [V17,T11] (  2,  2.50)     ref  ->  r15         class-hnd "Inlining Arg"
+;  V18 tmp5         [V18,T12] (  2,  2.50)     ref  ->  r14         class-hnd "Inlining Arg"
+;* V19 tmp6         [V19    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
+;* V20 tmp7         [V20    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;* V21 tmp8         [V21    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;  V22 tmp9         [V22,T20] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
+;* V23 tmp10        [V23    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;* V24 tmp11        [V24    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
+;  V25 tmp12        [V25,T21] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
 ;* V26 tmp13        [V26    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
 ;* V27 tmp14        [V27    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;  V28 tmp15        [V28,T13] (  2,  2.50)     ref  ->  r15         class-hnd "Inlining Arg"
-;  V29 tmp16        [V29,T14] (  2,  2.50)     ref  ->  r14         class-hnd "Inlining Arg"
-;* V30 tmp17        [V30    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
-;* V31 tmp18        [V31    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;* V32 tmp19        [V32    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;  V33 tmp20        [V33,T31] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
-;* V34 tmp21        [V34    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;* V35 tmp22        [V35    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;  V36 tmp23        [V36,T32] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
-;* V37 tmp24        [V37    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;* V38 tmp25        [V38    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
-;  V39 tmp26        [V39,T33] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
-;* V40 tmp27        [V40    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
-;  V41 tmp28        [V41,T01] (  2, 16   )    bool  ->  rax         "Inlining Arg"
-;  V42 tmp29        [V42,T05] (  2, 10   )     ref  ->  r12         class-hnd "Inlining Arg"
-;* V43 tmp30        [V43    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
-;  V44 tmp31        [V44,T15] (  2,  2   )    bool  ->  rax         "Inlining Arg"
-;  V45 tmp32        [V45,T34] (  2,  1.50)     ref  ->  rsi         class-hnd "Inlining Arg"
-;  V46 tmp33        [V46,T25] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
-;  V47 tmp34        [V47    ] (  8, 14   )  simd16  ->  [rsp+0x40]   do-not-enreg[XS] addr-exposed "by-value struct argument"
-;  V48 tmp35        [V48,T16] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V49 tmp36        [V49    ] (  8, 14   )  simd16  ->  [rsp+0x30]   do-not-enreg[XS] addr-exposed "by-value struct argument"
-;  V50 tmp37        [V50,T17] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V51 tmp38        [V51    ] (  6,  6   )  simd16  ->  [rsp+0x20]   do-not-enreg[XS] addr-exposed "by-value struct argument"
-;  V52 tmp39        [V52,T18] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V53 tmp40        [V53,T19] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V54 tmp41        [V54,T20] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V55 tmp42        [V55,T21] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V56 tmp43        [V56,T26] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
-;  V57 tmp44        [V57,T27] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
-;  V58 tmp45        [V58,T28] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
-;  V59 tmp46        [V59,T07] (  2,  8   )    long  ->  rcx         "argument with side effect"
-;  V60 tmp47        [V60,T08] (  2,  8   )    long  ->  rcx         "argument with side effect"
-;  V61 tmp48        [V61,T10] (  3,  6   )     ref  ->  rcx         "argument with side effect"
-;  V62 tmp49        [V62,T22] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V63 tmp50        [V63,T23] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V64 tmp51        [V64,T29] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
-;  V65 tmp52        [V65,T24] (  2,  2   )    long  ->  rcx         "argument with side effect"
-;  V66 cse0         [V66,T12] (  3,  3   )     ref  ->  r14         "CSE - conservative"
-;  V67 cse1         [V67,T06] (  7,  8   )    long  ->  rbp         "CSE - aggressive"
-;  V68 cse2         [V68,T03] (  6, 13.50)    long  ->  r15         "CSE - aggressive"
+;  V28 tmp15        [V28,T22] (  2,  1.50)     ref  ->  r15         class-hnd "Inlining Arg"
+;* V29 tmp16        [V29    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
+;  V30 tmp17        [V30,T01] (  2, 16   )    bool  ->  rax         "Inlining Arg"
+;  V31 tmp18        [V31,T05] (  2, 10   )     ref  ->  r12         class-hnd "Inlining Arg"
+;* V32 tmp19        [V32    ] (  0,  0   )    long  ->  zero-ref    "NewObj constructor temp"
+;  V33 tmp20        [V33,T13] (  2,  2   )    bool  ->  rax         "Inlining Arg"
+;  V34 tmp21        [V34,T23] (  2,  1.50)     ref  ->  rsi         class-hnd "Inlining Arg"
+;  V35 tmp22        [V35,T14] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
+;  V36 tmp23        [V36,T15] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
+;  V37 tmp24        [V37,T16] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
+;  V38 tmp25        [V38,T17] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
+;  V39 tmp26        [V39,T08] (  3,  6   )     ref  ->  rcx         "argument with side effect"
+;  V40 tmp27        [V40,T18] (  3,  1.50)     ref  ->  rcx         "argument with side effect"
+;  V41 cse0         [V41,T10] (  3,  3   )     ref  ->  r14         "CSE - aggressive"
+;  V42 cse1         [V42,T06] (  7,  8   )    long  ->  rbp         "CSE - aggressive"
+;  V43 cse2         [V43,T03] (  6, 13.50)    long  ->  r15         "CSE - aggressive"
 ;
-; Lcl frame size = 336
+; Lcl frame size = 112
 
 G_M2855_IG01:
        push     r15
@@ -86,14 +61,16 @@ G_M2855_IG01:
        push     rsi
        push     rbp
        push     rbx
-       sub      rsp, 336
-       movaps   qword ptr [rsp+140H], xmm6
-       movaps   qword ptr [rsp+130H], xmm7
-       movaps   qword ptr [rsp+120H], xmm8
+       sub      rsp, 112
+       movaps   qword ptr [rsp+60H], xmm6
+       movaps   qword ptr [rsp+50H], xmm7
+       movaps   qword ptr [rsp+40H], xmm8
+       movaps   qword ptr [rsp+30H], xmm9
+       movaps   qword ptr [rsp+20H], xmm10
        mov      rsi, rcx
        mov      rdi, rdx
        mov      rbx, r8
-						;; bbWeight=1    PerfScore 17.00
+						;; bbWeight=1    PerfScore 23.00
 G_M2855_IG02:
        mov      rbp, qword ptr [(reloc)]
        mov      r14, gword ptr [rbp]
@@ -111,28 +88,25 @@ G_M2855_IG03:
        call     qword ptr [rax+32]System.Diagnostics.DebugProvider:Fail(System.String,System.String):this
 						;; bbWeight=0.25 PerfScore 3.13
 G_M2855_IG04:
-       lea      rcx, [rsp+110H]
-       mov      edx, -128
-       call     [System.Runtime.Intrinsics.Vector128:Create(short):System.Runtime.Intrinsics.Vector128`1[Int16]]
-       lea      rcx, [rsp+100H]
-       mov      edx, 0x7F80
-       call     [System.Runtime.Intrinsics.Vector128:Create(ushort):System.Runtime.Intrinsics.Vector128`1[UInt16]]
-       movdqu   xmm6, xmmword ptr [rsi]
+       movups   xmm6, xmmword ptr [reloc @RWD16]
+       movups   xmm7, xmmword ptr [reloc @RWD48]
+       movdqu   xmm8, xmmword ptr [rsi]
        call     [System.Runtime.Intrinsics.X86.Sse41:get_IsSupported():bool]
        test     al, al
        je       SHORT G_M2855_IG07
-						;; bbWeight=1    PerfScore 13.75
+						;; bbWeight=1    PerfScore 12.25
 G_M2855_IG05:
-       movaps   xmm0, xmmword ptr [rsp+110H]
-       ptest    xmm6, xmm0
-       je       G_M2855_IG09
+       ptest    xmm8, xmm6
+       je       SHORT G_M2855_IG09
        xor      rax, rax
-						;; bbWeight=0.50 PerfScore 3.63
+						;; bbWeight=0.50 PerfScore 2.13
 G_M2855_IG06:
-       movaps   xmm6, qword ptr [rsp+140H]
-       movaps   xmm7, qword ptr [rsp+130H]
-       movaps   xmm8, qword ptr [rsp+120H]
-       add      rsp, 336
+       movaps   xmm6, qword ptr [rsp+60H]
+       movaps   xmm7, qword ptr [rsp+50H]
+       movaps   xmm8, qword ptr [rsp+40H]
+       movaps   xmm9, qword ptr [rsp+30H]
+       movaps   xmm10, qword ptr [rsp+20H]
+       add      rsp, 112
        pop      rbx
        pop      rbp
        pop      rsi
@@ -141,30 +115,22 @@ G_M2855_IG06:
        pop      r14
        pop      r15
        ret      
-						;; bbWeight=0.50 PerfScore 8.38
+						;; bbWeight=0.50 PerfScore 12.38
 G_M2855_IG07:
-       lea      rcx, [rsp+F0H]
-       movaps   xmmword ptr [rsp+40H], xmm6
-       lea      rdx, bword ptr [rsp+40H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt16(System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[UInt16]]
-       lea      rcx, [rsp+E0H]
-       movaps   xmm0, xmmword ptr [rsp+F0H]
-       movaps   xmm1, xmmword ptr [rsp+100H]
-       paddusw  xmm0, xmm1
-       movaps   xmmword ptr [rsp+30H], xmm0
-       lea      rdx, bword ptr [rsp+30H]
-       call     [System.Runtime.Intrinsics.Vector128:AsByte(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]]
-       movaps   xmm0, xmmword ptr [rsp+E0H]
+       movaps   xmm0, xmm8
+       paddusw  xmm0, xmm7
        pmovmskb  eax, xmm0
        test     eax, 0xAAAA
        je       SHORT G_M2855_IG09
        xor      rax, rax
-						;; bbWeight=0.50 PerfScore 11.92
+						;; bbWeight=0.50 PerfScore 1.54
 G_M2855_IG08:
-       movaps   xmm6, qword ptr [rsp+140H]
-       movaps   xmm7, qword ptr [rsp+130H]
-       movaps   xmm8, qword ptr [rsp+120H]
-       add      rsp, 336
+       movaps   xmm6, qword ptr [rsp+60H]
+       movaps   xmm7, qword ptr [rsp+50H]
+       movaps   xmm8, qword ptr [rsp+40H]
+       movaps   xmm9, qword ptr [rsp+30H]
+       movaps   xmm10, qword ptr [rsp+20H]
+       add      rsp, 112
        pop      rbx
        pop      rbp
        pop      rsi
@@ -173,55 +139,34 @@ G_M2855_IG08:
        pop      r14
        pop      r15
        ret      
-						;; bbWeight=0.50 PerfScore 8.38
+						;; bbWeight=0.50 PerfScore 12.38
 G_M2855_IG09:
-       movaps   xmm7, xmm6
-       packuswb  xmm7, xmm6
-       lea      rcx, [rsp+D0H]
-       movaps   xmmword ptr [rsp+20H], xmm7
-       lea      rdx, bword ptr [rsp+20H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt64(System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt64]]
-       movaps   xmm0, xmmword ptr [rsp+D0H]
-       movq     xmmword ptr [rdi], xmm0
+       movaps   xmm9, xmm8
+       packuswb  xmm9, xmm8
+       movq     xmmword ptr [rdi], xmm9
        mov      r14d, 8
        test     dil, 8
-       jne      G_M2855_IG12
-       movdqu   xmm6, xmmword ptr [rsi+16]
+       jne      SHORT G_M2855_IG12
+       movdqu   xmm8, xmmword ptr [rsi+16]
        call     [System.Runtime.Intrinsics.X86.Sse41:get_IsSupported():bool]
        test     al, al
        je       SHORT G_M2855_IG10
-       movaps   xmm0, xmmword ptr [rsp+110H]
-       ptest    xmm6, xmm0
+       ptest    xmm8, xmm6
        je       SHORT G_M2855_IG11
        jmp      G_M2855_IG28
-						;; bbWeight=0.50 PerfScore 14.00
+						;; bbWeight=0.50 PerfScore 8.00
 G_M2855_IG10:
-       lea      rcx, [rsp+70H]
-       movaps   xmmword ptr [rsp+40H], xmm6
-       lea      rdx, bword ptr [rsp+40H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt16(System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[UInt16]]
-       lea      rcx, [rsp+60H]
-       movaps   xmm0, xmmword ptr [rsp+70H]
-       movaps   xmm1, xmmword ptr [rsp+100H]
-       paddusw  xmm0, xmm1
-       movaps   xmmword ptr [rsp+30H], xmm0
-       lea      rdx, bword ptr [rsp+30H]
-       call     [System.Runtime.Intrinsics.Vector128:AsByte(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]]
-       movaps   xmm0, xmmword ptr [rsp+60H]
-       pmovmskb  ecx, xmm0
-       test     ecx, 0xAAAA
+       movaps   xmm0, xmm8
+       paddusw  xmm0, xmm7
+       pmovmskb  eax, xmm0
+       test     eax, 0xAAAA
        jne      G_M2855_IG28
-						;; bbWeight=0.50 PerfScore 11.79
+						;; bbWeight=0.50 PerfScore 1.42
 G_M2855_IG11:
-       movaps   xmm7, xmm6
-       packuswb  xmm7, xmm6
-       lea      rcx, [rsp+50H]
-       movaps   xmmword ptr [rsp+20H], xmm7
-       lea      rdx, bword ptr [rsp+20H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt64(System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt64]]
-       movaps   xmm0, xmmword ptr [rsp+50H]
-       movq     xmmword ptr [rdi+8], xmm0
-						;; bbWeight=0.50 PerfScore 5.63
+       movaps   xmm9, xmm8
+       packuswb  xmm9, xmm8
+       movq     xmmword ptr [rdi+8], xmm9
+						;; bbWeight=0.50 PerfScore 1.13
 G_M2855_IG12:
        mov      rax, rdi
        and      rax, 15
@@ -288,42 +233,30 @@ G_M2855_IG20:
        sub      rbx, 16
 						;; bbWeight=0.50 PerfScore 0.13
 G_M2855_IG21:
-       movdqu   xmm6, xmmword ptr [rsi+2*r14]
-       movdqu   xmm7, xmmword ptr [rsi+2*r14+16]
-       movaps   xmm8, xmm6
-       por      xmm8, xmm7
+       movdqu   xmm8, xmmword ptr [rsi+2*r14]
+       movdqu   xmm9, xmmword ptr [rsi+2*r14+16]
+       movaps   xmm10, xmm8
+       por      xmm10, xmm9
        call     [System.Runtime.Intrinsics.X86.Sse41:get_IsSupported():bool]
        test     al, al
        je       SHORT G_M2855_IG24
 						;; bbWeight=4    PerfScore 35.33
 G_M2855_IG22:
-       movaps   xmm0, xmmword ptr [rsp+110H]
-       ptest    xmm8, xmm0
+       ptest    xmm10, xmm6
        je       SHORT G_M2855_IG25
-						;; bbWeight=2    PerfScore 14.00
+						;; bbWeight=2    PerfScore 8.00
 G_M2855_IG23:
        jmp      G_M2855_IG30
 						;; bbWeight=0.50 PerfScore 1.00
 G_M2855_IG24:
-       lea      rcx, [rsp+C0H]
-       movaps   xmmword ptr [rsp+40H], xmm8
-       lea      rdx, bword ptr [rsp+40H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt16(System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[UInt16]]
-       lea      rcx, [rsp+B0H]
-       movaps   xmm0, xmmword ptr [rsp+C0H]
-       movaps   xmm1, xmmword ptr [rsp+100H]
-       paddusw  xmm0, xmm1
-       movaps   xmmword ptr [rsp+30H], xmm0
-       lea      rdx, bword ptr [rsp+30H]
-       call     [System.Runtime.Intrinsics.Vector128:AsByte(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]]
-       movaps   xmm0, xmmword ptr [rsp+B0H]
-       pmovmskb  eax, xmm0
+       paddusw  xmm10, xmm7
+       pmovmskb  eax, xmm10
        test     eax, 0xAAAA
        jne      G_M2855_IG30
-						;; bbWeight=2    PerfScore 47.17
+						;; bbWeight=2    PerfScore 5.17
 G_M2855_IG25:
-       packuswb  xmm6, xmm7
-       movaps   xmm7, xmm6
+       packuswb  xmm8, xmm9
+       movaps   xmm9, xmm8
        lea      r15, [rdi+r14]
        test     r15b, 15
        sete     al
@@ -343,7 +276,7 @@ G_M2855_IG26:
        call     qword ptr [rax+32]System.Diagnostics.DebugProvider:Fail(System.String,System.String):this
 						;; bbWeight=1    PerfScore 16.25
 G_M2855_IG27:
-       movdqa   xmmword ptr [r15], xmm7
+       movdqa   xmmword ptr [r15], xmm9
        add      r14, 16
        cmp      r14, rbx
        jbe      G_M2855_IG21
@@ -352,10 +285,12 @@ G_M2855_IG28:
        mov      rax, r14
 						;; bbWeight=0.50 PerfScore 0.13
 G_M2855_IG29:
-       movaps   xmm6, qword ptr [rsp+140H]
-       movaps   xmm7, qword ptr [rsp+130H]
-       movaps   xmm8, qword ptr [rsp+120H]
-       add      rsp, 336
+       movaps   xmm6, qword ptr [rsp+60H]
+       movaps   xmm7, qword ptr [rsp+50H]
+       movaps   xmm8, qword ptr [rsp+40H]
+       movaps   xmm9, qword ptr [rsp+30H]
+       movaps   xmm10, qword ptr [rsp+20H]
+       add      rsp, 112
        pop      rbx
        pop      rbp
        pop      rsi
@@ -364,36 +299,24 @@ G_M2855_IG29:
        pop      r14
        pop      r15
        ret      
-						;; bbWeight=0.50 PerfScore 8.38
+						;; bbWeight=0.50 PerfScore 12.38
 G_M2855_IG30:
        call     [System.Runtime.Intrinsics.X86.Sse41:get_IsSupported():bool]
        test     al, al
        je       SHORT G_M2855_IG31
-       movaps   xmm0, xmmword ptr [rsp+110H]
-       ptest    xmm6, xmm0
+       ptest    xmm8, xmm6
        je       SHORT G_M2855_IG32
        jmp      SHORT G_M2855_IG28
-						;; bbWeight=0.50 PerfScore 6.63
+						;; bbWeight=0.50 PerfScore 5.13
 G_M2855_IG31:
-       lea      rcx, [rsp+A0H]
-       movaps   xmmword ptr [rsp+40H], xmm6
-       lea      rdx, bword ptr [rsp+40H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt16(System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[UInt16]]
-       lea      rcx, [rsp+90H]
-       movaps   xmm0, xmmword ptr [rsp+A0H]
-       movaps   xmm1, xmmword ptr [rsp+100H]
-       paddusw  xmm0, xmm1
-       movaps   xmmword ptr [rsp+30H], xmm0
-       lea      rdx, bword ptr [rsp+30H]
-       call     [System.Runtime.Intrinsics.Vector128:AsByte(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]]
-       movaps   xmm0, xmmword ptr [rsp+90H]
-       pmovmskb  eax, xmm0
+       paddusw  xmm7, xmm8
+       pmovmskb  eax, xmm7
        test     eax, 0xAAAA
-       jne      G_M2855_IG28
-						;; bbWeight=0.50 PerfScore 11.79
+       jne      SHORT G_M2855_IG28
+						;; bbWeight=0.50 PerfScore 1.29
 G_M2855_IG32:
-       movaps   xmm7, xmm6
-       packuswb  xmm7, xmm6
+       movaps   xmm9, xmm8
+       packuswb  xmm9, xmm8
        lea      r15, [rdi+r14]
        test     r15b, 7
        sete     al
@@ -413,15 +336,15 @@ G_M2855_IG33:
        call     qword ptr [rax+32]System.Diagnostics.DebugProvider:Fail(System.String,System.String):this
 						;; bbWeight=0.25 PerfScore 4.06
 G_M2855_IG34:
-       lea      rcx, [rsp+80H]
-       movaps   xmmword ptr [rsp+20H], xmm7
-       lea      rdx, bword ptr [rsp+20H]
-       call     [System.Runtime.Intrinsics.Vector128:AsUInt64(System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt64]]
-       movaps   xmm0, xmmword ptr [rsp+80H]
-       movq     xmmword ptr [r15], xmm0
+       movq     xmmword ptr [r15], xmm9
        add      r14, 8
        jmp      G_M2855_IG28
-						;; bbWeight=0.50 PerfScore 6.13
+						;; bbWeight=0.50 PerfScore 1.63
+RWD00  db	000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h
+RWD16  db	080h, 0FFh, 080h, 0FFh, 080h, 0FFh, 080h, 0FFh, 080h, 0FFh, 080h, 0FFh, 080h, 0FFh, 080h, 0FFh
+RWD32  db	000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h
+RWD48  db	080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh
+
 
-; Total bytes of code 1243, prolog size 42, PerfScore 438.33, (MethodHash=d994f4d8) for method System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
+; Total bytes of code 833, prolog size 42, PerfScore 316.58, (MethodHash=d994f4d8) for method System.Text.ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
 ; ============================================================

echesakov on Aug 11, 2020

Right now all Vector128 and Vector256 are treated as regular calls on x86/x64 per this condition

https://github.com/dotnet/runtime/blob/189e1aa8f91632c196fe0e6cb1410a85bb7d2283/src/coreclr/src/zap/zapinfo.cpp#L2173-L2178

@davidwrighton @jkotas

Do you think that we can relax this condition and allow some of Vector128 methods to be treated as intrinsics when compiling System.Private.CoreLib?

As far as I understand, Sse and Sse2 are required ISAs on x86/x64 platforms, so it appears to be safe to do this, at least, for the following ones:

Vector128<T>.As*
Vector128<T>.get_Count
Vector128<T>.get_Zero

echesakov on Aug 6, 2020