runtime: CultureInfo.TextInfo.ListSeparator broken in .Net 5

As part of updating of updating the Csv class in Signum.Utilities to .Net 5, I’ve realize that the value for ListSeparator has changed for many cultures that use , instead of . for decimals separators.

foreach (var ci in new [] { "", "en", "en-GB", "en-US", "es", "es-ES", "de", "de-DE", "fr", "fr-FR" })
{
	Console.WriteLine($"Culture {ci}\tListSeparator {CultureInfo.GetCultureInfo(ci).TextInfo.ListSeparator}");
}

netcoreapp3.1

Culture         ListSeparator ,
Culture en      ListSeparator ,
Culture en-GB   ListSeparator ,
Culture en-US   ListSeparator ,
Culture es      ListSeparator ;
Culture es-ES   ListSeparator ;
Culture de      ListSeparator ;
Culture de-DE   ListSeparator ;
Culture fr      ListSeparator ;
Culture fr-FR   ListSeparator ;

net5 rc2

Culture         ListSeparator ,
Culture en      ListSeparator ,
Culture en-GB   ListSeparator ,
Culture en-US   ListSeparator ,
Culture es      ListSeparator .
Culture es-ES   ListSeparator .
Culture de      ListSeparator .
Culture de-DE   ListSeparator .
Culture fr      ListSeparator  
Culture fr-FR   ListSeparator

Both tests in the same machine.

Previously I was using ListSeparator as a way to detect the separator that will be used in a Csv file. Is this the intended purpose?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 61 (37 by maintainers)

Commits related to this issue

temp fix ListSeparator https://github.com/dotnet/runtime/issues/43795 — committed to signumsoftware/framework by olmobrutall 4 years ago

Most upvoted comments

Sorry @tarekgh, but to me this surely looks like a bug. If you look at the relevant code at https://github.com/dotnet/runtime/blob/0e402bcd28f38a3b65720474dcd7a338cab0841d/src/libraries/Native/Unix/System.Globalization.Native/pal_localeStringData.c#L246 you can see that it isn’t even implemented (falls through to ThousandsSeparator). So either this is a bug in .Net Core or the info is generally not available in ICU, which would then still be a bug (And would not be an issue for CLDR if it is not part of it). As native german I can tell that using . as a list separator does not make any sense.

ANahr on Oct 25, 2020

I totally understand people used this APIs in wrong way when used it for CSV. I was trying to say neither .NET nor Windows documented anything saying that.

Windows PowerShell 5.1 documents the use of ListSeparator in CSV, and PowerShell 7 continues that; see the -UseCulture switch in ConvertFrom-Csv, ConvertTo-Csv, Export-Csv, and Import-Csv. I don’t think it has been filed as a bug, although the behavior is mentioned in https://github.com/PowerShell/PowerShell/issues/11754.

KalleOlaviNiemitalo on Oct 28, 2020

I have been using ListSeparator for human-readable output, never for parsing. I suspect that hardcoding "," or ", " would be acceptable for more cultures than copying the ThousandsSeparator.

KalleOlaviNiemitalo on Oct 26, 2020

I won’t repeat myself about the sources for the problem but in the end I see three issues:

Issue 1: The current output is unreasonable for major parts of the world. This is NOT complaining that it would have changed if it would have changed to something reasonable. But it did not. At least not for several/most non-en cultures.

Issue 2: Usage as a CSV separator: @tarekgh understands that “people used this APIs in wrong way when used it for CSV”. It doesn’t seem to matter that people used ListSeparator for over 2 decades for this purpose. Excel uses it, SPSS uses it, Powershell uses it (and yes I know NOT for 20 years), basically every general purpose CSV processor uses and/or supports it but they are all doing it wrong. And they all did it wrong for 20 years and more and didn’t even notice. But irony aside: This is a special case of Issue 1: The current output is unreasonable for use as a CSV separator for major/relevant parts of the cultures.

Issue 3: Usage for Excel-COM-Interop: That would really be an issue if ICU defined a list separator that was differing from the windows one. However it doesn’t even define one, so this seems pretty made up. And IF it defined one I would be pretty condident that it would at least be some reasonable one for the corresponding culture.

And just something that made me smile (no good smile): @tarekgh s “I was trying to say neither .NET nor Windows documented anything saying that.” With that argument you can also say that you must not use a String to store song lyrics because I bet that “neither .NET nor Windows documented anything saying that.”

ANahr on Oct 28, 2020

@tarekgh what’s wrong with Number symbols list element (https://unicode.org/reports/tr35/tr35-numbers.html#Number_Symbols)? Because for the few European cultures I checked that have ‘;’ as a list separator on Windows, this property is also ‘;’ in CLDR.

Also, the list pattern you mention is a generic list separator for itemizing AFAICT, whereas sList/ListSeparator is used to group numbers (where ‘,’ is not possible for European cultures, due to that character already being the decimal separator – and I can attest to this European usage of ”;” in case of ambiguities in case of writing number sequences and intervals).

The fact that all other characters in the Numbers section in the regional settings also correspond to the symbols section from CLDR also points at that particular solution.

Yes, ListSeparatorn is wonderfully vague on what kind of lists it is talking about, but judging from the context of the regional settings and its use by lots of software this way (Excel, CSV), I think the list symbol makes much more sense, and is probably mostly backward compatible with NLS data.

rubenprins on Nov 5, 2020

FWIW, you can also find that data here: https://github.com/unicode-org/icu/tree/master/icu4c/source/data/locales

i.e, for en locale: https://github.com/unicode-org/icu/blob/master/icu4c/source/data/locales/en.txt#L2014-L2019

safern on Nov 5, 2020

Just curious @tarekgh what part of CLDR does that come from?

Every locale has a properties listed under listPattern section, something like

		<listPattern>
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0}, and {1}</listPatternPart>
			<listPatternPart type="2">{0} and {1}</listPatternPart>
		</listPattern>

You can reach the data if you download the CLDR http://unicode.org/Public/cldr/38/cldr-common-38.0.zip then open the main folder which include locale files. You can look at root locale files (e.g. en.xml) to find all properties including the list pattern.

What I actually did is I called ICU APIs that format the list and then I extracted the separator for every locale from the API output. let me know if you need more details.

tarekgh on Nov 5, 2020

Just to update, I’ll try to look at getting some fix in the 5.0 servicing release (something like 5.0.1) so we don’t have to wait to 6.0.

tarekgh on Oct 31, 2020

I think that focusing on Excel and CSV files all the time muddles the issue: using the thousands separator as a list separator is the worst possible fallback that could have been used. Hard-coding a comma would even be better (albeit not what Excel would expect).

A “.” or non-breaking space is never a list separator, AFAICT. The choice for the thousands separator works exclusively for locales using English-based number formatting, that happens to use the comma, almost by accident.

rubenprins on Oct 27, 2020

This is a regression for .Net Core 3.1 on Windows -> .Net 5
This is a regression for ANY .Net Framework -> .Net 5 Imho this is a serious issue that should be fixed asap. Is just hard to spot because it a) by chance does not hit any EN-cultures in default cases (no user overrides) b) affects processes that are untestable or at least problematic to test automatically.

As much as I hate this option but even throwing a PlatformNotSupportedException would be far better than the current situation. For a short term personally I’d be pragmatic and try to find a heuristic that at least handles the 95% cases (Like all EN-Cultures get “,”, all others get “;”).

I have seen three types of usage in our code and the current situation breaks all of them:

User-(UI)-Text output of list elements. This is generally the most unproblematic, however would be very annoying because e.g. “.” is in NO culture worldwide a reasonable list separator, however several cultures (like DE) deliver that. Imagine: “Data: 34,43.34.42,43”
CSV Export/Import: CSV formats are usually/always culture specific. Using the ListSeparator is a reasonable default that will work in the majority of cases wordwide (or at least in countries for which we develop software).

3a) Excel-“Interfacing” through “excel.exe export.csv”: This currently works flawlessly because excel uses the same system list separator. This would be very inconvenient to implement yourself because it would require native dependecies to get user locale overrides (which excel also uses).

3b) Excel-Interfacting though COM-Automation: Although Excel offers functions to get internationalization these are known to not work in lots of situations (e.g. multilanguage Excel install but lots more) whereas list separator basically always works (unless VERY stupid user locale overrides, in which Excel itself will not work correctly anyways)

Considering the “workarounds”:

Using Invariant: This is no workaround. We are talking about culture specific functions in the first place. If you could use Invariant there would be no need to use any culture-specific function at all.
Using the Config-Switch: This is highly problematic. CSV/Export is usually defined in libraries. As far as I see this is only available as an app-wide switch. Internally we have several hundered applications and there are more which are developed by customers over which we have no control at all. Even just internally this would be a huge undertaking just to inform all product owners, find out who even uses these libraries, etc. and it would be terribly error-prone.
Stop using ListSeparator and implement our own. Partially possible but a really bad situation for the following reasons: A) Code duplication B) Risk of multiple differing implementation and worst: C) would likely require native code/pinvokes for getting user overrides from the OS (which comes with a lot of additional problems). D) Efford E) Additional error source (might be overlooked in the future)

ANahr on Oct 26, 2020

If I didn’t hear from you, I’ll go with the NLS data to keep the compatibility and in the future releases we can expose new APIs if needed for list patterns.

tarekgh on Nov 10, 2020

@rubenprins @KalleOlaviNiemitalo TextInfo.ListSeparator is not for numbers only. it can separate any kind of data. This is why list pattern would fit better. Also if you look at https://unicode.org/reports/tr35/tr35-numbers.html#Number_Symbols it is clearly says

- symbol used to separate numbers in a list intended to represent structured data such as an array; must be different from the decimal value. This list separator is for “non-linguistic” usage as opposed to the listPatterns for “linguistic” lists (e.g. “Bob, Carol, and Ted”) described in Part 2, Section 11 List Patterns.

Which means this number separator is not for linguistic while TextInfo.ListSeprator is a cultural property which should be linguistic. That is exactly match the list pattern in CLDR.

As @KalleOlaviNiemitalo indicated, even CLDR number separator will not be compatible with NLS. If really non-linguistic separator needed here, then I would say let it be , for all locales which I think will make CSV/Excel scenarios less broken.

To reiterate on CSV/Excel, I am still not seeing a way can guarantee ListSeparator will make this scenario reliable. This is wrong usage of ListSeparator and apps should depend on the real Excel doc separator. We’ll keep ListSeparator read the user override for the current culture so most of the scenarios (as @KalleOlaviNiemitalo thankfully pointed at) like PoswerShell and COM interop will continue work as it used to be.

If people prefer the NLS compatibility, we can keep this but this is the least preferable to me because this is Windows specific and also this data can change at any time too in Windows. I am still thinking the data I mentioned https://github.com/dotnet/runtime/issues/43795#issuecomment-722049526 would be the most sensible data we can return for this separator. what you think?

tarekgh on Nov 6, 2020

CLDR 38 has a semicolon ; without spaces as the number list separator in en.xml, and en_US.xml does not override it. So it’s not the same as the Windows NLS data, which has used a comma , for en-US.

Anyway, I now think the CLDR number list separator is a better fit for TextInfo.ListSeparator than what one could extract from list patterns, because it does not end with a space. An API for linguistic list formatting using CLDR list patterns could then be added in .NET 6 or 7 as a separate issue.

(The country-dependent list separator was apparently added to INT 21H AH=38H in MS-DOS 3.0, and only had room for one byte of text and a null character.)

KalleOlaviNiemitalo on Nov 5, 2020

@danmosemsft I added Preview 4 section in the doc and listed this separator issue under it.

tarekgh on Nov 2, 2020

And what is it intended for then? The first 1% of library for displaying list localized for humans?

How you plan to fit https://unicode-org.github.io/cldr-staging/charts/37/by_type/miscellaneous.displaying_lists.html#4f8acaf2d32aff3a into one string property?

Over an already ambiguous documentation "Gets or sets the string that separates items in a list." you put a warning discouraging the usage. Is there any real use case for this property using the new implementations? And how you justify the . being returned for es/ de/fr/…? There is no language that uses . as list separator, no matter the interpretation.

I think the problem is clear:

The code is clearly broken, it was identified as broken in .Net 3.1 in UNIX and now is going to affect many people.

The proper implementation is not trivial. If ICU will have an equivalent of LOCALE_SLIST nobody would be trying to redefine the meaning of ListSeparator, but it doesn’t.

Also, you all are about to release .Net 5 and you don’t want/can’t make changes in the code, like marking it as Obsolete, so you’re trying to sell me that this is not a bug, it’s a preciously crafted fallback implementations that has been carefully designed.

I can understand deadlines, but just be honest.

olmobrutall on Oct 28, 2020

It’s true that the documentation of ListSeparator and LOCALE_SLIST is intentionally ambiguous, but here more evidence that THE WORLD is using ListSeparator and LOCALE_SLIST for localized excel CSVs.

https://community.intel.com/t5/Intel-Fortran-Compiler/CSV-delimiter-detection/td-p/956160 http://www.dataaccess.com/KBasePublic/KBPrint.asp?ArticleID=1388 https://it.mathworks.com/matlabcentral/answers/343998-how-can-i-recognize-automatically-the-list-separator-in-the-region-and-languages-settings http://www.vbaexpress.com/forum/showthread.php?42769-save-as-csv-file-format/page2 https://www.mrexcel.com/board/threads/reading-large-dataset-in-excel.929526/ https://forums.ni.com/t5/LabWindows-CVI/Creating-CSV-file-according-with-the-international-settings/td-p/3184019?profile.language=en http://archives.miloush.net/michkap/archive/2007/08/15/4396922.html https://engineertips.wordpress.com/2018/06/07/local-csv-separator-in-delphi/ https://www.generacodice.com/en/articolo/130293/How-to-read-'List-separator'-from-OS-in-Java http://www.office-loesung.de/ftopic87375_0_0_asc.php https://codereview.stackexchange.com/questions/126146/change-list-separator-parse-file-restore-list-separator-to-original-value https://www.experts-exchange.com/questions/24491939/Convert-a-xls-File-to-a-CSV-file-SemiColon-Seperated.html https://living-sun.com/excel/249689-how-to-export-excel-to-csv-file-with-ldquordquo-delimted-and-utf-8-code-excel-vba-csv-utf-8.html http://www.delphigroups.info/2/10/861201.html https://forums.codeguru.com/showthread.php?348853-Regional-config-of-panel-control

Do you really need more evidence?

olmobrutall on Oct 27, 2020

But the current behavior of ListSeparator is just a fall-through. I understand that the .Net 5.0 needs to be released, maybe the best solution will be to mark the property as [Obsolete] or throw a NotImplementedException for now.

I don’t think it is a good idea to throw or mark it obsolete. we are going to fix it in the next releases anyway. Anyone can switch back to the old behavior anyway using the config switch. And the current behavior is already used on Linux for long time now.

Thanks for all your feedback and discussion and feel free to ping us if you get any more questions.

tarekgh on Oct 26, 2020

Hi @tarekgh,

If you use Excel in Spain or Germany (any most (all?) of the continental Europe) and export to CSV instead of:

Year,Make,Model,Length
1997,Ford,E350,2.35
2000,Mercury,Cougar,2.38

you get

Year;Make;Model;Length
1997;Ford;E350;2,35
2000;Mercury;Cougar;2,38

Notice how the decimal numbers use ,, so we use ; to separate cells.

I’m sure there are ways to configure Excel to export the english way, but CSV is typically used for one time data-loading scenarios connecting different departments, provided by the customer or other third party company, downloaded from internet, etc… it is very convenient to be able to set the culture once, and the CSV library switches number format, date format, and separator.

Csv.ReadFile<ProductInfoCsv>("productos.csv", culture: CultureInfo.GetCulture("es-ES")).Select(prod => ...)

By using the code that I mentioned (ups… corrected now):

public static string GetListSeparator(this CultureInfo culture) => culture.NumberInfo.NumberDecimalSeparator == ',' ? ";" : ",";

This will work for the European cultures and for the English cultures, but if someone from the second group (for example latin america es-DO, es-GT, es-HN, es-NI, es-PA, es-PE, es-PR) exports from Excel to CSV, maybe it will produce something like this:

Year;Make;Model;Length
1997;Ford;E350;2,35
2000;Mercury;Cougar;2.38

(notice the . in the decimal number. Confirmation of this behavior pending!)

And my simple heuristic won’t work, because it will assume that is english style and expect , to be the separator.

@KalleOlaviNiemitalo I don’t think using ", " or "," is useful for anybody, why use CultureInfo.Current.TextInfo.ListSeparator instead of just ", "? Just so maybe they change it in the future and what you formatted is completely crazy in other languages?

Remember that in Europe we use , to enumerate stuff, not ;. The ; is just a Excel CSV thing.

Also, checking for TextInfo.ListSeparator in google, looks like the people using it right now are CSV libraries:

https://dotnetfiddle.net/MAm1t1 https://github.com/JoshClose/CsvHelper/issues/918

olmobrutall on Oct 26, 2020

by the way, does your product currently support running on Linux?

While I have applications running in Docker for production, parsing Csv files is typically done in ETL processes that we run in windows, so it has gone unnoticed for now.

About GetListSeparator, the current implementation of ListSeparator not only return null or string.Empty. Also returns . in a lot of cases.

For now I use:

public static string GetListSeparator(this CultureInfo culture) => string.IsNullOrWhiteSpace(culture.NumberInfo.NumberDecimalSeparator) == ',' ? ";" : ",";

This works for the first 180 cultures (english group) and the this group of 375 (europe group) but leaves the other cultures broken.

Can someone from this groups check what excel does when you export to CSV?

olmobrutall on Oct 26, 2020