runtime: Non-ASCII chars shouldn't compare equal to ASCII chars under OrdinalIgnoreCase comparison
Under ICU, there are some non-ASCII code points that become ASCII code points after a simple case mapping transformation.
'K' (U+212A KELVIN SIGN) ~= 'k' (U+006B LATIN SMALL LETTER K) [simple lowercase mapping]
'ſ' (U+017F LATIN SMALL LETTER LONG S) ~= 'S' (U+0053 LATIN CAPITAL LETTER S) [simple uppercase mapping]
Since it’s common for applications to use StringComparison.OrdinalIgnoreCase
when comparing things like usernames, this could be a pit of failure for those applications, as it could lead to the following behavior at runtime.
string.Equals("administrator", "adminiſtrator", StringComparison.OrdinalIgnoreCase) // <-- FALSE on Windows, TRUE on Linux
A fairly straightfoward fix would be to prevent non-ASCII chars and ASCII characters from being equal under an OrdinalIgnoreCase
comparison. It would mean that OrdinalIgnoreCase
is no longer a direct wrapper around ICU’s case mapping / case folding APIs, but it would bring the behavior more in line with what developers have come to expect over .NET’s history.
With this proposal, ToUpperInvariant
and ToLowerInvariant
would be a direct wrapper around ICU’s underlying simple case mapping APIs, and it wouldn’t special-case any characters.
Related: https://github.com/dotnet/runtime/pull/27540
The following APIs would be affected:
string.Equals
,string.Compare
,string.GetHashCode
, and any other APIs which might acceptStringComparison.OrdinalIgnoreCase
as a parameterStringComparer.OrdinalIgnoreCase.Equals
andStringComparer.OrdinalIgnoreCase.GetHashCode
TextInfo.Compare
and similar APIs which might acceptCompareOptions.OrdinalIgnoreCase
/cc @tarekgh
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 28 (28 by maintainers)
I would expose new APIs to support different length for compact reason. I saw a usage before that consumer of the current APIs always assumed the same input length.
Just to clarify, you’re suggesting:
This means that
string.Equals(a, b, StringComparison.OrdinalIgnoreCase)
will no longer be equivalent tostring.Equals(a.ToUpperInvariant(), b.ToUpperInvariant(), StringComparison.Ordinal)
, which is a behavioral change from previous framework releases.I’m ok with your suggestion here if you think we can swing the breaking change. 😃