runtime: Perf regression for Math.Min and Math.Max

The Math.Min and Math.Max functions that take double/float were updated to be IEEE compliant in https://github.com/dotnet/coreclr/pull/20912. However, the JIT does not do a great job optimizing this code and it has resulted in a regression.

For example, in the Ray Tracer that Matt Warren ported to C# as part of https://mattwarren.org/2019/03/01/Is-CSharp-a-low-level-language/#using-mathf-functions-instead-of-math, I see the following locally:

  • C++ 19.16: 26sec
  • C# netcoreapp2.2: 36sec
  • C# netcoreapp3.0: 57sec

Reverting the change back to the original implementation gives:

  • C# netcoreapp3.0: 31sec

The function could likely be intrinsified in order to help with the codegen here. This can’t be done using exclusively minss as that always returns the second operand if either of the operands is NaN, so we could emulate this branchlessly using the recommendation in the architecture manual:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN in either source operand be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

I believe that should be something similar to the following:

if (Sse.IsSupported)
{
    var lhs = Vector128.CreateScalarUnsafe(val1);
    var rhs = Vector128.CreateScalarUnsafe(val2);

    var tmp1 = Sse.MinScalar(rhs, lhs);
    var tmp2 = Sse.CompareUnorderedScalar(lhs, lhs);

    if (Avx.IsSupported)
    {
        return Avx.BlendVariable(tmp1, rhs, tmp2).ToScalar();
    }
    else
    {
        var tmp3 = Sse.And(tmp2, rhs);
        tmp2 = Sse.AndNot(tmp2, tmp1);
        return Sse.Or(tmp2, tmp3).ToScalar();
    }
}

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 20 (20 by maintainers)

Most upvoted comments

The root problem is the JIT doesn’t handle NaN comparisons very well, which results in a bunch of branches in a small window of code

I can improve NaN comparisons in the JIT. Though last time I tried it didn’t seem to make a difference.