runtime: [Uri] Uri.IsWellFormedUriString() returns false for a URL which is correct
I have a C# (.Net Core 1.1) app that needs to check if a URL is valid. I used the Uri.IsWellFormedUriString() which works pretty well but have a doubt about this one below which returns false. It seems to me that the URL is perfectly valid?
Uri.IsWellFormedUriString("http://www.test.com/search/Le+Venezuela+b%C3%A9n%C3%A9ficie+d%27importantes+ressources+naturelles+%3A+p%C3%A9trole%2C+gaz%2C+mines", UriKind.Absolute)
I used the very same URL with the PHP function below which says the URL is correctly formatted:
function filter_var($url, FILTER_VALIDATE_URL)
If I refer to the RFC3986 it seems this URL is correct. Am I missing something here?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 19
- Comments: 24 (12 by maintainers)
Commits related to this issue
- misc: Remove URI check from EmbedBuilder (#1778) `Uri.IsWellFormedUriString()` doesn't return the expected result for specific urls, removed until the DotNet team actually resolves it ( https://githu... — committed to discord-net/Discord.Net by asmejkal 3 years ago
- Added option to ignore URI parsing errors When initializing a DataServiceContext there was a mandatory URI check relying on .NETs Uri.IsWellFormedUriString. Sadly there is an active issue on the .NET... — committed to AkquinetRistec/odata.net by AntiGuideAkquinet 2 years ago
- Add unit tests for Uri.IsWellFormedUri Uri.IsWellFormedUri() reports a false negative when mixing characters like Å or ตั together with any of the RFC 3986 section 2.2 Reserved Characters, ! * ' ( ) ... — committed to julian94/runtime by deleted user 2 years ago
@karelz
I suspect the issue is related to combining encoded characters that require one encode value and characters that require multiple encode values. For example, 学 encodes to %E5%AD%A6 while [ encodes to %5B.
Here are some examples:
This URI is also failing on .NET 6.0
https://storage.googleapis.com/kocmoc/audio-archive/2021_10/2021_10_13 - The Last Dive with ΑΧΙΝΟΙ.mp3
@karelz .NET 6.0 was just released… having to wait for .NET 7 for a critical bug that should have been fixed 4 years ago is a bit ridiculous.
It has total 13 customer reports (1 was offline) – only 2 are upvotes of the top post (1 is the original post).
Can I ask everyone who’ve hit it to please upvote the top post? It will help us prioritize.
Moving it to .NET 7.0 as it has rather larger impact. cc @MihaZupan
In .net core 2.1 I am also encountering what looks to be the same bug, or a very similar bug.
However, if I leave the URI unencoded it passes the IsWellFormedUriString check:
We also just encountered this issue.
And I just noticed this is up-for-grabs. A friend and I could be interested in doing this. @karelz everything good for a PR? And on the documentation part, whose responsibility would be that? The pr author, or you guys? I imagine the latter?
I just ran into this issue at work and can add that using non-ascii characters like Å or ตั together with any of the RFC 3986 section 2.2 Reserved Characters fails, ! * ’ ( ) ; : @ & = + $ , / ? # [ ]. E.g. Å*
I just ran into this one. You probably have plenty of examples, but just to further confirm @nicholasb90 's hypothesis:
Is this a recommended work-around, i.e. using
Uri.UnescapeDataStringon the string before testing it? It makes my example pass but not sure if there are pitfalls.As @svick said , I managed to overcome this issue by decoding the url.
Try enabling IDN and IRI-Parsing in your App.config by adding this to your configuration section to ensure correct handling for international character set:
Afer doing this, you should create a decoded version of your URL like this to avoid complications between encoded and decoded URLs:
Now you can check like this:
Maybe this is not a perfect solution, but the closest one for me to get this working as reliable as possible.
Btw. I’m using .Net Framework 4.5.2, but I guess it should also work with lower versions.