runtime: [Uri] Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri
Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri
General
As described by the title, paths with Unicode/UTF-8 characters are incorrectly parsed/reported by System.Uri resulting in an invalid path. For example the path “/üri” will result in an Uri like “file:///%C3%BCri/üri” (note the unescaped /üri at the end).
This also happens with other Unicode/UTF-8 characters like £, §, etc. So you can replace ü by any other Unicode/UTF-8 character in my example and see the same result, e.g. the path is doubled.
Expected Result
PathAndQuery = AbsolutePath = "/%C3%BCri"
AbsoluteUri = "file:///%C3%BCri"
Results collected using mono 5.14 on the same Linux machine.
Actual Result
PathAndQuery = AbsolutePath = "/%C3%BCri" // This seems to be correct
AbsoluteUri = "/%C3%BCri/%C3%BCri" // Note the additional "/%C3%BCri" at the end
_string = "/üri/üri" // Note the additinal "/üri" at the end
Using the .NET core version mentioned below.
System Informations
$ dotnet --info
.NET Core SDK (reflecting any global.json):
Version: 2.1.302
Commit: 9048955601
Runtime Environment:
OS Name: gentoo
OS Version:
OS Platform: Linux
RID: gentoo-x64
Base Path: /opt/dotnet_core/sdk/2.1.302/
Host (useful for support):
Version: 2.1.2
Commit: 811c3ce6c0
Code to reproduce
using System;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var uri = new Uri("/üri");
Console.WriteLine(uri.ToString()); // file:///%C3%BCri/üri
}
}
}
Above code prints “file:///%C3%BCri/üri” while “file:///%C3%BCri” is expected.
Using “/üri/üri” in the Uri ctor results in a path like “file:///%C3%BCri/%C3%BCri/üri/üri”.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 15 (11 by maintainers)
I have looked at the issue on Linux, the problem has nothing to do with icu. here is what is the problem:
The URI code detect that running on Linux and the string starts with ‘/’ which means it could be a valid file path. and store the internal uri._string as the original value “/üri”. Later, the code will call the method ParseRemaining which will call EscapeUnescapeIri.
https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L3394
EscapeUnescapeIri will return “/üri” and then code will concatenate this value to the original _string. that means _string now will be storing “/üri/üri”
Then later the code will try to get the host name. will detect the host name should be the first 4-characters “/üri” and will call EscapeString helper method to normalize this name which will return “/%C3%BCri”
https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L2520
that makes the whole uri as “file:///%C3%BCri/üri”
Let me know if I can help in anything more.