runtime: Perf regression for Math.Min and Math.Max
The Math.Min
and Math.Max
functions that take double
/float
were updated to be IEEE compliant in https://github.com/dotnet/coreclr/pull/20912. However, the JIT does not do a great job optimizing this code and it has resulted in a regression.
For example, in the Ray Tracer that Matt Warren ported to C# as part of https://mattwarren.org/2019/03/01/Is-CSharp-a-low-level-language/#using-mathf-functions-instead-of-math, I see the following locally:
- C++ 19.16: 26sec
- C# netcoreapp2.2: 36sec
- C# netcoreapp3.0: 57sec
Reverting the change back to the original implementation gives:
- C# netcoreapp3.0: 31sec
The function could likely be intrinsified in order to help with the codegen here. This can’t be done using exclusively minss
as that always returns the second operand if either of the operands is NaN
, so we could emulate this branchlessly using the recommendation in the architecture manual:
If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN in either source operand be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.
I believe that should be something similar to the following:
if (Sse.IsSupported)
{
var lhs = Vector128.CreateScalarUnsafe(val1);
var rhs = Vector128.CreateScalarUnsafe(val2);
var tmp1 = Sse.MinScalar(rhs, lhs);
var tmp2 = Sse.CompareUnorderedScalar(lhs, lhs);
if (Avx.IsSupported)
{
return Avx.BlendVariable(tmp1, rhs, tmp2).ToScalar();
}
else
{
var tmp3 = Sse.And(tmp2, rhs);
tmp2 = Sse.AndNot(tmp2, tmp1);
return Sse.Or(tmp2, tmp3).ToScalar();
}
}
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 20 (20 by maintainers)
I can improve NaN comparisons in the JIT. Though last time I tried it didn’t seem to make a difference.