runtime: unable to parse '-1' string in Norwegian

using System;
namespace wtfdot
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Threading.Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("nb-NO");
            int.Parse("-1");
        }
    }
}

produces

Unhandled Exception: System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(ReadOnlySpan1 str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(ReadOnlySpan1 s, NumberStyles style, NumberFormatInfo info) at System.Int32.Parse(String s)

dotnet --version

2.2.103

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (15 by maintainers)

Most upvoted comments

@veonua to summarize this thread:

  • If nb-NO culture has a different negative sign than the hyphen (0x002D), this should be done for a good reason. If you think this is wrong, you can raise your concern to CLDR to fix that.
  • If the culture didn’t choose hyphen (0x002D) as negative sign, then we either cannot assume it is. hyphen (0x002D) can be used for other meaning for the culture which we cannot know. Imagine the culture decide to use the hyphen (0x002D) as thousand separator for instance.
  • The best practice to guarantee parsing, is to format the number with invariant culture and then parse it with Invariant. if you don’t have control over the source of the string you are parsing, then Invariant will still be the best guess to use. You cannot just magically make any number formatted with some culture can be parsed with other culture. Imagine you formatted a number with thousand separator in Arabic culture and then you are using Spanish culture to parse it. this is expected to fail. The parser cannot assume every possible culture data especially any sign character can mean something else in different cultures.

by that, I am closing this issue but feel free to reply with any more questions if you have any. Thanks.

@danmosemsft If what 99 % of users of a culture type can’t be parsed correctly, then I think it makes int.Parse nigh useless for that culture and I would consider that a bug.

The right fix might be to change CLDR data for the Norwegian culture, but I think the .Net team is in a better position to attempt to make that change, than a random person.

@danmosemsft yes this is a good summary.

To be more helpful, here is the link can report any issue to CLDR http://cldr.unicode.org/index/bug-reports

Also I want to mention, we have fixed different parsing issues for nb-NO culture before. So, we really care about the customers when we have more control over the issue. @svick sorry if I really left any bad impression when I closed the issue, but what @danmosemsft mentioned explain why I closed the issue.

@tannergooding Windows trying to get closer to CLDR as much as can but still there is some difference which can be expected to see (or intentionally decided). I still think would be the best this problem be fixed in CLDR (if it is considered really a problem for such culture). anyway, there is a lot collaboration between CLDR and Windows and it is really going is very good direction.

@svick - ah, I see. We ask folks to report directly to CLDR because we do not have expertise in the culture in which the issue is being reported, and therefore only make things less efficient trying to be in the “middle” of any discussion. I believe CLDR is the closest to a standard across Windows and Unix - it is not something niche that .NET chose to depend on. If the issue was something specific to .NET’s use of ICU/CLDR then we probably would want to be involved - that’s not the case for the choice of negative sign in nb-NO. @tarekgh is that a reasonable summary?

@danmosemsft

do you agree, that we cannot generally parse numbers correctly without knowing the culture? For example, 100,123 is a much bigger number in fr-FR than in en-US?

Of course.

In which case I think it comes down to a possible bug in the culture data, which is coming from CLDR. We do not want to get in the business of defining our own culture data as it is complex and ever changing.

I understand that. My problem is that what I see here is:

This is not a bug in our code, so we’re going to close this issue and we will leave it to someone else to fix CLDR.

The attitude I would like to see is:

This problem is affecting our customers, so we will keep this issue open and we will work with the maintainers of CLDR on resolving it.

@svick do you agree, that we cannot generally parse numbers correctly without knowing the culture? For example, 100,123 is a much smaller number in fr-FR than in en-US?

In which case I think it comes down to a possible bug in the culture data, which is coming from CLDR. We do not want to get in the business of defining our own culture data as it is complex and ever changing.

@tarekgh

If you think this is wrong, you can raise your concern to CLDR to fix that.

Like I said above, I think this is a bug in .Net. And I think fixing those should be a responsibility of the .Net team, even if the ultimate source of the bug is in a third-party dependency.

if you don’t have control over the source of the string you are parsing, then Invariant will still be the best guess to use. You cannot just magically make any number formatted with some culture can be parsed with other culture.

The invariant culture is not a good option for parsing strings from users. And it seems like you’re saying that it’s fine if using the actual culture also doesn’t work correctly for that. This is not about having a string that uses one culture and parsing it with another culture. It’s about parsing strings from Norwegian users using the Norwegian culture not working.

In my opinion, int.Parse should correctly parse what a regular user of a given culture is likely to type. It’s not good enough if it’s only supposed to parse the result of int.ToString().

My expectation that Framework would isolate me from platform & cluture specific issues

That’s what CultureInfo.InvariantCulture is for. Pass that to your Parse call as the provider, or set it as the current culture, and regardless of locale you’ll get the same parsing behavior.

The right fix might be to change CLDR data for the Norwegian culture

Yes. I don’t believe we should be second-guessing the data from ICU / the OS. If there’s an issue with that data, we should work with the provider of it to fix it.

cc: @tarekgh, @krwq

It seems we are operating by design - we expect to be running int.Parse in the same culture that the input was formatted in.

We don’t in general know all the values that NumberFormatInfo.NegativeSign may take today and in the future on various cultures so it seems like we could not reasonably be more tolerant.

@veonua is it possible to set the thread culture to match the culture the number was formatted in? If it might have either nb-No or (say) en-US negative signs, you might have to use “TryParse” and fall back from one to the other if the first fails.