runtime: String.StartsWith slower on Linux with some characters

string.StartsWith on Linux becomes 2 orders of magnitude slower when the string contains a dash (-).

On Linux:

BenchmarkDotNet=v0.11.3, OS=centos 7
Intel Xeon CPU E5-2630L v3 1.80GHz, 2 CPU, 32 logical and 16 physical cores
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  Job-UBBGCZ : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT

Runtime=Core  Toolchain=netcoreapp3.0

         Method |        Mean |      Error |     StdDev |
--------------- |------------:|-----------:|-----------:|
     StartsWith |    35.79 ns |  0.1069 ns |  0.0948 ns |
 StartsWithDash | 4,411.13 ns | 35.0054 ns | 29.2311 ns |

On Windows (only for reference, the hardware is not the same):

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.18362
Intel Xeon CPU E3-1271 v3 3.60GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview8-013656
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT


         Method |     Mean |     Error |    StdDev |
--------------- |---------:|----------:|----------:|
     StartsWith | 69.42 ns | 0.2523 ns | 0.2236 ns |
 StartsWithDash | 69.47 ns | 1.4200 ns | 1.6904 ns |

Benchmark code:

    public class StartsWithBenchmark
    {
        private string _str1 = "aaaaaaaaaz";
        private string _str2 = "aaaaaaaaa-";

        [Benchmark]
        public bool StartsWith()
        {
            return _str1.StartsWith("i");
        }

        [Benchmark]
        public bool StartsWithDash()
        {
            return _str2.StartsWith("i");
        }
    }

The performance issue does not occur if using ordinal comparison.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 20 (19 by maintainers)

Most upvoted comments

@tarekgh It’s not truly helpful to simply label this as a corner case. People expect consistent performance and if you have a method that is suddenly magnitudes slower due to certain characters being used it’s like dropping time bombs into peoples code.

https://github.com/dotnet/coreclr/pull/26759 and https://github.com/dotnet/coreclr/pull/26621 combined together have fixed this problem.

Fun fact: while working on improving the performance of StartsWith on Linux we have found and fixed an 18 year old bug in ICU https://github.com/unicode-org/icu/pull/840 😉

@kevingosse thanks for you measurements. I believe @adamsitnik PR is going to help some with the StartsWith scenario.

@tarekgh is there any reason why we should not implement StartsWith in the following way:

bool StartsWith(string source, string prefix, StringComparison stringComparison)
  => CompareString(source.AsSpan(start: 0, length: prefix.Length), prefix, stringComparison);

Edit: nevermind, I’ve got an answer from @kevingosse in https://github.com/dotnet/coreclr/pull/26481 😉