runtime: [Perf][Windows_NT] Investigate the improvement/regressions on System/IO/Tests/PerfStreamWriter
From release/2.0.0 to release/2.1 there has been the following changes in the tests:
WriteCharArray(writeLength: 100) // Improved ~8%
WriteCharArray(writeLength: 2) // Improved ~7%
WritePartialCharArray(writeLength: 100) // Improved ~16%
WritePartialCharArray(writeLength: 2) // Regressed ~25%
WriteString(writeLength: 2) // Regressed ~4%
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 21 (21 by maintainers)
Jan made the change to inline the string to span conversion and it’s been picked up by CoreFx. Master perf now looking good for both Windows & Ubuntu. Note this is the version that is timing the entire operation – no bits are escaping from the timing loop.
Should we make the string->span conversion AggressiveInlining? That operation is done a lot, both with
AsSpan
and with the implicit cast operator method.(And thanks for looking, Andy!)
cc: @ahsonkhan
Some data from various combinations, measured locally:
So if we update the benchmark (via a noinline wrapper) to not allow the prolog costs to escape the timing loop, the modded version of the stream code actually appears to be a bit slower than the current version. But if we let the costs escape (as happens now) the modded version is faster.
I think we are better off leaving things as is – the cases that can benefit from modding (or reverting the change that caused the regression) are ones that look more or less exactly like the benchmark: it must be a method that repeatedly calls
Write(string)
in a long-running loop on very short strings and also directly constructs theStreamReader
in a way that visibly connects it to the call toWrite
.[Edit: added impact of inlining the string->span conversion]
The problem appears to be the
[MethodImpl(MethodImplOptions.NoInlining)]
that we put ontoWrite(string)
at the last minute. When I remove that, throughput on thisWrite(twoCharString)
benchmark more than doubles.This is strange, though, as previously while
Write(string)
would have been inlined, theWriteCore(ReadOnlySpan<>)
method it called wouldn’t have been; now theWriteSpan(ReadOnlySpan<>)
method it calls will be and theWrite(string)
won’t, so there’s still the same number of method calls. There must be something more intricate going on…cc: @AndyAyersMS