runtime: [Perf][Windows_NT] Investigate the improvement/regressions on System/IO/Tests/PerfStreamWriter

From release/2.0.0 to release/2.1 there has been the following changes in the tests:

  WriteCharArray(writeLength: 100)         // Improved ~8%
  WriteCharArray(writeLength: 2)           // Improved ~7%
  WritePartialCharArray(writeLength: 100)  // Improved ~16%
  WritePartialCharArray(writeLength: 2)    // Regressed ~25%
  WriteString(writeLength: 2)              // Regressed ~4%

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

Jan made the change to inline the string to span conversion and it’s been picked up by CoreFx. Master perf now looking good for both Windows & Ubuntu. Note this is the version that is timing the entire operation – no bits are escaping from the timing loop.

image

[Edit: added impact of inlining the string->span conversion]

Should we make the string->span conversion AggressiveInlining? That operation is done a lot, both with AsSpan and with the implicit cast operator method.

(And thanks for looking, Andy!)

cc: @ahsonkhan

Some data from various combinations, measured locally:

StreamWriter Benchmark Perf
current current 286
modded current 161
current noinline 302
modded noinline 342
current + string->span inlined current 148

So if we update the benchmark (via a noinline wrapper) to not allow the prolog costs to escape the timing loop, the modded version of the stream code actually appears to be a bit slower than the current version. But if we let the costs escape (as happens now) the modded version is faster.

I think we are better off leaving things as is – the cases that can benefit from modding (or reverting the change that caused the regression) are ones that look more or less exactly like the benchmark: it must be a method that repeatedly calls Write(string) in a long-running loop on very short strings and also directly constructs the StreamReader in a way that visibly connects it to the call to Write.

[Edit: added impact of inlining the string->span conversion]

The problem appears to be the [MethodImpl(MethodImplOptions.NoInlining)] that we put onto Write(string) at the last minute. When I remove that, throughput on this Write(twoCharString) benchmark more than doubles.

This is strange, though, as previously while Write(string) would have been inlined, the WriteCore(ReadOnlySpan<>) method it called wouldn’t have been; now the WriteSpan(ReadOnlySpan<>) method it calls will be and the Write(string) won’t, so there’s still the same number of method calls. There must be something more intricate going on…

cc: @AndyAyersMS