runtime: Non-ASCII chars shouldn't compare equal to ASCII chars under OrdinalIgnoreCase comparison

Under ICU, there are some non-ASCII code points that become ASCII code points after a simple case mapping transformation.

'K' (U+212A KELVIN SIGN) ~= 'k' (U+006B LATIN SMALL LETTER K) [simple lowercase mapping]
'ſ' (U+017F LATIN SMALL LETTER LONG S) ~= 'S' (U+0053 LATIN CAPITAL LETTER S) [simple uppercase mapping]

Since it’s common for applications to use StringComparison.OrdinalIgnoreCase when comparing things like usernames, this could be a pit of failure for those applications, as it could lead to the following behavior at runtime.

string.Equals("administrator", "adminiſtrator", StringComparison.OrdinalIgnoreCase) // <-- FALSE on Windows, TRUE on Linux

A fairly straightfoward fix would be to prevent non-ASCII chars and ASCII characters from being equal under an OrdinalIgnoreCase comparison. It would mean that OrdinalIgnoreCase is no longer a direct wrapper around ICU’s case mapping / case folding APIs, but it would bring the behavior more in line with what developers have come to expect over .NET’s history.

With this proposal, ToUpperInvariant and ToLowerInvariant would be a direct wrapper around ICU’s underlying simple case mapping APIs, and it wouldn’t special-case any characters.

Related: https://github.com/dotnet/runtime/pull/27540

The following APIs would be affected:

  • string.Equals, string.Compare, string.GetHashCode, and any other APIs which might accept StringComparison.OrdinalIgnoreCase as a parameter
  • StringComparer.OrdinalIgnoreCase.Equals and StringComparer.OrdinalIgnoreCase.GetHashCode
  • TextInfo.Compare and similar APIs which might accept CompareOptions.OrdinalIgnoreCase

/cc @tarekgh

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (28 by maintainers)

Most upvoted comments

I would expose new APIs to support different length for compact reason. I saw a usage before that consumer of the current APIs always assumed the same input length.

Just to clarify, you’re suggesting:

string.Equals("administrator", "adminiſtrator", StringComparison.OrdinalIgnoreCase) // <-- with this change, will return FALSE on all platforms
"administrator".ToUpperInvariant() == "adminiſtrator".ToUpperInvariant() // <-- will continue to return FALSE on Windows, TRUE on Linux

This means that string.Equals(a, b, StringComparison.OrdinalIgnoreCase) will no longer be equivalent to string.Equals(a.ToUpperInvariant(), b.ToUpperInvariant(), StringComparison.Ordinal), which is a behavioral change from previous framework releases.

I’m ok with your suggestion here if you think we can swing the breaking change. 😃