runtime: List of performance regressions caused by switching to ICU

Before 5.0, we were using ICU only on Unix systems. In 5.0 we have decided to use it on Windows by default as well.

This is something that we have done in order to have the same behavior of string-related globalization APIs on every OS.

However, this particular change has affected the performance characteristics of many frequently used methods. Some of them have regressed, some have improved.

Recently we have reported a lot of 5.0 regressions related to that. Since we have done this on purpose and we are most probably not going to revert the switch, I am opening this issue to track the list of all known regressions. When the list becomes complete, we are most probably going to update the 5.0 release docs and close the issue and label it as wont fix.

Please feel free to edit the list.

Known changes:

cc @danmosemsft @tarekgh @billwert @DrewScoggins @GrabYourPitchforks @jkotas @safern

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 8
  • Comments: 69 (65 by maintainers)

Most upvoted comments

After the most recent check in (https://github.com/dotnet/runtime/pull/43065) that Tarek made, we are seeing some good improvements in the lab.

Run Information

Architecture x64
OS Windows 10.0.18362
Changes diff

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Modality Baseline Outlier Baseline ETL Comapre ETL
IndexOf_Word_NotFound 13.06 μs 9.15 μs 0.70 True
LastIndexOf_Word_NotFound 18.87 μs 15.01 μs 0.80 True
LastIndexOf_Word_NotFound 20.55 μs 16.44 μs 0.80 True
IsSuffix_SecondHalf 22.05 μs 17.57 μs 0.80 False
IsPrefix_FirstHalf 19.68 μs 15.59 μs 0.79 False
IndexOf_Word_NotFound 12.96 μs 9.05 μs 0.70 True
IsPrefix_DifferentFirstChar 34.04 μs 29.81 μs 0.88 True
IndexOf_Word_NotFound 16.56 μs 12.95 μs 0.78 True
IsSuffix_DifferentLastChar 39.49 μs 34.97 μs 0.89 True
IndexOf_Word_NotFound 13.19 μs 9.30 μs 0.70 True
LastIndexOf_Word_NotFound 19.46 μs 15.26 μs 0.78 True
IndexOf_Word_NotFound 12.83 μs 9.16 μs 0.71 False
IndexOf_Word_NotFound 13.19 μs 9.27 μs 0.70 True
LastIndexOf_Word_NotFound 19.20 μs 14.99 μs 0.78 False
LastIndexOf_Word_NotFound 19.41 μs 15.28 μs 0.79 False
LastIndexOf_Word_NotFound 18.57 μs 14.01 μs 0.75 True

graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Globalization.Tests.StringSearch*'

Histogram

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, True))

[ 8961.400 ;  9663.790) | @@@@@@@@@@@@@@
[ 9663.790 ; 10366.179) | 
[10366.179 ; 11068.568) | 
[11068.568 ; 11770.957) | 
[11770.957 ; 12473.347) | 
[12473.347 ; 12726.514) | 
[12726.514 ; 13428.903) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13428.903 ; 14213.285) | @@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, True))

[14612.596 ; 15349.017) | @@@@@@@@@@@@@@
[15349.017 ; 16056.794) | 
[16056.794 ; 16764.571) | 
[16764.571 ; 17472.348) | 
[17472.348 ; 18217.629) | 
[18217.629 ; 18831.044) | @
[18831.044 ; 19538.820) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19538.820 ; 20122.311) | @@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (pl-PL, None, False))

[16251.652 ; 16961.645) | @@@@@@@@@@@@@@
[16961.645 ; 17671.638) | 
[17671.638 ; 18381.630) | 
[18381.630 ; 19091.623) | 
[19091.623 ; 19801.616) | 
[19801.616 ; 20174.942) | 
[20174.942 ; 20884.934) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[20884.934 ; 21604.903) | @@@

System.Globalization.Tests.StringSearch.IsSuffix_SecondHalf(Options: (en-US, IgnoreSymbols, False))

[17072.752 ; 17964.482) | @@@@@@@@@@@@@@
[17964.482 ; 18779.784) | 
[18779.784 ; 19595.085) | 
[19595.085 ; 20410.387) | 
[20410.387 ; 21514.014) | 
[21514.014 ; 22329.316) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[22329.316 ; 23330.116) | @@@@@@@@@@

System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options: (en-US, IgnoreSymbols, False))

[15229.897 ; 15975.829) | @@@@@@@@@@@@@@
[15975.829 ; 16721.761) | 
[16721.761 ; 17467.693) | 
[17467.693 ; 18213.625) | 
[18213.625 ; 18959.557) | 
[18959.557 ; 19302.824) | 
[19302.824 ; 20048.756) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[20048.756 ; 20855.951) | @@@@@

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False))

[ 8797.707 ;  9499.196) | @@@@@@@@@@@@@@
[ 9499.196 ; 10200.684) | 
[10200.684 ; 10902.172) | 
[10902.172 ; 11603.661) | 
[11603.661 ; 12305.149) | 
[12305.149 ; 12684.041) | 
[12684.041 ; 13385.530) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13385.530 ; 14220.333) | @@

System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False))

[29477.421 ; 30235.312) | @@@@@@@@@@@@@@
[30235.312 ; 30993.203) | 
[30993.203 ; 31751.095) | 
[31751.095 ; 32508.986) | 
[32508.986 ; 33769.724) | 
[33769.724 ; 35223.347) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (pl-PL, None, False))

[12609.085 ; 13265.894) | @@@@@@@@@@@@@
[13265.894 ; 13768.769) | @
[13768.769 ; 14425.579) | 
[14425.579 ; 15082.389) | 
[15082.389 ; 15739.199) | 
[15739.199 ; 16287.555) | 
[16287.555 ; 16944.365) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[16944.365 ; 17648.441) | @@

System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US, IgnoreSymbols, False))

[34563.746 ; 35375.219) | @@@@@@@@@@@@@@
[35375.219 ; 36186.692) | 
[36186.692 ; 36998.165) | 
[36998.165 ; 37809.638) | 
[37809.638 ; 39015.124) | @
[39015.124 ; 39826.597) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[39826.597 ; 40796.807) | @@@@@

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, True))

[ 8890.906 ;  9625.004) | @@@@@@@@@@@@@@
[ 9625.004 ; 10333.195) | 
[10333.195 ; 11041.386) | 
[11041.386 ; 11749.577) | 
[11749.577 ; 12457.769) | 
[12457.769 ; 13023.837) | @
[13023.837 ; 13732.028) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13732.028 ; 14434.832) | @@@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True))

[14805.514 ; 15554.325) | @@@@@@@@@@@@@@
[15554.325 ; 16303.136) | 
[16303.136 ; 17051.948) | 
[17051.948 ; 17800.759) | 
[17800.759 ; 18662.288) | 
[18662.288 ; 19285.851) | @@
[19285.851 ; 20034.662) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[20034.662 ; 20599.685) | @@@@

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, True))

[ 9024.075 ;  9674.928) | @@@@@@@@@@@@@@
[ 9674.928 ; 10325.780) | 
[10325.780 ; 10976.632) | 
[10976.632 ; 11627.484) | 
[11627.484 ; 12278.336) | 
[12278.336 ; 12637.952) | 
[12637.952 ; 13288.804) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13288.804 ; 13899.140) | @@

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True))

[ 8882.220 ;  9650.252) | @@@@@@@@@@@@@@
[ 9650.252 ; 10352.069) | 
[10352.069 ; 11053.887) | 
[11053.887 ; 11755.704) | 
[11755.704 ; 12457.521) | 
[12457.521 ; 12812.048) | 
[12812.048 ; 13513.865) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13513.865 ; 14286.340) | @@@@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, True))

[14587.344 ; 15366.223) | @@@@@@@@@@@@@@
[15366.223 ; 16127.006) | 
[16127.006 ; 16887.789) | 
[16887.789 ; 17648.572) | 
[17648.572 ; 18409.355) | 
[18409.355 ; 18787.408) | 
[18787.408 ; 19548.191) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19548.191 ; 20346.411) | @@@@@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, True))

[14863.073 ; 15658.737) | @@@@@@@@@@@@@@
[15658.737 ; 16397.914) | 
[16397.914 ; 17137.092) | 
[17137.092 ; 17876.269) | 
[17876.269 ; 18615.446) | 
[18615.446 ; 18981.118) | 
[18981.118 ; 19720.295) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19720.295 ; 20512.543) | @@@@

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False))

[13785.427 ; 14537.175) | @@@@@@@@@@@@@@
[14537.175 ; 15288.923) | 
[15288.923 ; 16040.671) | 
[16040.671 ; 16792.419) | 
[16792.419 ; 17544.167) | 
[17544.167 ; 18045.402) | 
[18045.402 ; 18964.029) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[18964.029 ; 19744.608) | @@

Docs

Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository

To tell, @jefgen thankfully is in doing some optimization work from the ICU side

https://github.com/unicode-org/icu/pull/1471 https://github.com/unicode-org/icu/pull/1473

You can see the perf numbers in the ticket https://unicode-org.atlassian.net/browse/ICU-21388

Thanks @jefgen for the details.

@L2 you still have the option to use the ICU app-local feature to use the latest ICU version if you want.

  <ItemGroup>
        <RuntimeHostConfigurationOption Include="System.Globalization.AppLocalIcu" Value="68.2.0.6" />
        <PackageReference Include="Microsoft.ICU.ICU4C.Runtime" Version="68.2.0.6" />
  </ItemGroup>

@tarekgh I’ve run the benchmark that you have provided and got similar results that confirm that 5.0 using ICU is faster in this particular case.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-rc.1.20452.10
  [Host]     : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-KYODGR : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-PPECTW : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT
Method Job Runtime Toolchain Mean Error StdDev Median Ratio
StringCompare Job-KYODGR .NET Core 3.1 netcoreapp3.1 377.30 ns 4.741 ns 4.435 ns 376.14 ns 1.00
StringCompare Job-PPECTW .NET Core 5.0 netcoreapp5.0 68.72 ns 1.342 ns 3.055 ns 67.42 ns 0.19

To compare apples to apples I’ve set the invocation count (the number of benchmark invocations per iteration) to the same number and filtered the ETW trace file to the last benchmark iteration (description of the mentioned filtering):

--invocationCount 2097152 --profiler ETW

For 3.1 a single iteration (2097152 invocations) takes 791ms:

obraz

For 5.0 a single iteration (2097152 invocations) takes 147ms:

obraz

we already fixed all ordinal[IgnoreCase] operations perf across all functionality)

I see. Many thanks!

you still have the option to switch back using NLS

I think it makes no sense because because more and more users work in a heterogeneous environment and with ICU there is less chance of getting different results.

by the way, Powershell already running on Linux for awhile with ICU, why you are concerned now?

Questions like why the performance/behavior is different on Windows and on Unix is very inconvenient. Questions like this sometimes appear in PowerShell repo (not related to the topic). With moving to ICU the likelihood of such questions decreases and this is great. PowerShell is highly dependent on OrdinalIgnoreCase and any improvements here have a positive impact on it. This is my only concern.

Thanks again for your great work!

Also, I want to be clear about the expectation here. I don’t think we can fix all perf here as we are limited by calling ICU. we can look at how we can improve it but I am not expecting to get the perf to the point where we used to call NLS. So, it will be good to decide which items in this list is a blocker. The only one was the Ordinal cases which I am addressing in the attached PR. I am not aware of any other blocking scenario. We’ll look more of course on other scenarios anyway but I am not sure how much we can do before 5.0 release.

Just wanted to update this thread with the auto-filed results showing big wins (70%) in the System.Memory.ReadOnlySpan benchmarks that regressed.

https://github.com/DrewScoggins/performance-2/issues/1392

@GrabYourPitchforks @GSPP just to let you know, I am looking at the linguistic IndexOf scenario and experimenting some changes that may help in the perf (around internally caching some ICU objects too). no promise yet as I am still in the middle of looking at that.

Also, we are following up with Windows team as they are trying to do some perf enhancement on ICU too. It is another win situation.

Last, as ICU is open source project, we have contributed some changes before which means it is possible we’ll contribute more if it is really required to enhance .NET scenarios. but in general this something we may look at for 6.0 version and beyond.

As a side note, ICU is open source and we can contribute to make hot paths faster

The story is unfortunately a bit complicated… The version of ICU that is part of the Windows OneCore base was updated to a newer version, version 68.2. However, Windows 10 is still based on the older OneCore release, so this means that Windows 10 didn’t get the updated version of ICU. The Windows 11 release was built on the newer OneCore bits, so it got the updated version of ICU. (You can see that the Windows 10 build number is still ~1904x, so the last few releases are all based on the same underlying OneCore bits).

What this means is that Windows 10 is still on ICU version 64.2, while Windows 11 is on ICU version 68.2.

roll out an updated icu.dll via windows updates anytime soon?

I don’t think there are currently any plans to push an updated version of ICU via Windows update at this time.

built with an older MSVC2017

AFAIK, the OS binaries aren’t built using the public compiler. However, I’m not sure off-hand what exact version of the cl.exe compiler is used though.

@jefgen very nice improvements! did Windows Team consider using PGO (Profile Guided Optimization) to improve ICU perf?

For who don’t know what is PGO, it is Profile-guided optimization.

@DrewScoggins I have merged today the other optimization work which targeting string search operations (IndexOf/LastIndexOf/IsPrefix/IsSuffix/StartsWith/EndsWith). could you please watch the perf results after running my changes and update this issue? Thanks a lot.

@iSazonov I tried your scenario (using .NET without PS) and I am seeing the similar results as you have reported it. This is very interesting. I am going to look more on the details to understand what is going on.

@GSPP I don’t think we have extensively studied what contributions we can make here. Regardless, ICU and NLS have different operational philosophies when it comes to this, and ICU’s consumers are absolutely used to caching the search object. The canonical scenario for linguistic searching in ICU is for a UI-based application. You open a browser window or word document, enter your search term, then see all matches highlighted in the document and use Next / Previous to iterate through them.

In our case, we case about non-linguistic (ordinal) searching. If this were to be contributed back to ICU it would be in the form of a brand new API. Furthermore, the type of ordinal comparison we’re using here (conversion to uppercase) is different than Unicode’s own recommendations (conversion to case-fold). All of this is to say that I’m not hopeful of a specialty API like the one .NET is using making its way through.

Would it be in the cards to contribute to ICU so that this operation is not by-design slow? I imagine that many projects relying on ICU would like a fast IndexOf that does require caching a searcher object.

@tarekgh your change is good. I was just curious why Linux and Windows perf were still significantly different but I realized I compared the wrong rows.

Thanks for the info, it is very helpful. could you please try the latest .NET builds which include the Ordinal perf improvements and let’s know if you see the differences now? let me know if you need help with that.

A day before PowerShell MSFT team tried to move to .Net 5.0 Preview8 but without success. I guess I can do some measurements only after we move to RC1.

/cc @SteveL-MSFT @daxian-dbw for information.

Also note that this is already what is used on Linux and Mac, and increasingly used within Windows OS itself. So we are aligning with the industry here. If and where it is slow - we all benefit from making it faster. The .NET team have contributed bug reports and performance improvements to libicu in the past, and I expect we will do so again.

are you going to close all other bugs complaining about ICU perf against this one?

@tarekgh I’ve gone through all System.Memory and System.Globalization issues with performance tag and updated the list. It should be complete now.