runtime: [Uri] Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri

Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri

General

As described by the title, paths with Unicode/UTF-8 characters are incorrectly parsed/reported by System.Uri resulting in an invalid path. For example the path “/üri” will result in an Uri like “file:///%C3%BCri/üri” (note the unescaped /üri at the end).

This also happens with other Unicode/UTF-8 characters like £, §, etc. So you can replace ü by any other Unicode/UTF-8 character in my example and see the same result, e.g. the path is doubled.

Expected Result

PathAndQuery = AbsolutePath = "/%C3%BCri"
AbsoluteUri = "file:///%C3%BCri"

Results collected using mono 5.14 on the same Linux machine.

Actual Result

PathAndQuery = AbsolutePath = "/%C3%BCri" // This seems to be correct
AbsoluteUri = "/%C3%BCri/%C3%BCri" // Note the additional "/%C3%BCri" at the end
_string = "/üri/üri" // Note the additinal "/üri" at the end

Using the .NET core version mentioned below.

System Informations

$ dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.1.302
 Commit:    9048955601

Runtime Environment:
 OS Name:     gentoo
 OS Version:
 OS Platform: Linux
 RID:         gentoo-x64
 Base Path:   /opt/dotnet_core/sdk/2.1.302/

Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

Code to reproduce

using System;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var uri = new Uri("/üri");
            Console.WriteLine(uri.ToString()); // file:///%C3%BCri/üri
        }
    }
}

Above code prints “file:///%C3%BCri/üri” while “file:///%C3%BCri” is expected.

Using “/üri/üri” in the Uri ctor results in a path like “file:///%C3%BCri/%C3%BCri/üri/üri”.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 15 (11 by maintainers)

Most upvoted comments

I have looked at the issue on Linux, the problem has nothing to do with icu. here is what is the problem:

The URI code detect that running on Linux and the string starts with ‘/’ which means it could be a valid file path. and store the internal uri._string as the original value “/üri”. Later, the code will call the method ParseRemaining which will call EscapeUnescapeIri.

https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L3394

EscapeUnescapeIri will return “/üri” and then code will concatenate this value to the original _string. that means _string now will be storing “/üri/üri”

Then later the code will try to get the host name. will detect the host name should be the first 4-characters “/üri” and will call EscapeString helper method to normalize this name which will return “/%C3%BCri”

https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L2520

that makes the whole uri as “file:///%C3%BCri/üri”

Let me know if I can help in anything more.