msbuild: Cultures aliased by ICU cannot be used for resource localization on non-Windows environments
From @CodingDinosaur on October 12, 2018 3:30
When building or running under .NET Core on a Unix-based environment, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. These cultures can be used as expected when building or running under Windows.
The affected cultures are those which are aliased by ICU – that is, to save on DB space for certain cases, ICU defines some locales as an “alias” of another. There are 42 locale aliases in ICU 57, of those two of the most common are zh-TW and zh-CN. For a full list of affected locales, see: ICU Aliased Locale List - CultureIssueDemonstration Readme.
This platform-inconsistent behavior when trying to localize certain resources, necessitates using special code and workarounds both at build and deploy time when developing cross-platform applications.
See a demo of this issue in CodingDinosaur/CultureIssueDemonstration
Symptoms
- Resource files for the affected locales will not be generated into resource assemblies if building in a Unix-based environment.
- For example: MyStrings.zh-TW.resx will not have a resource assembly created if the build occurs on a Unix-based environment, but will if built under Windows
- Resources for the affected locales will not be utilized even if the resource files are present, falling back to the default resources.
- For example: Consider MyStrings.resx and MyStrings.zh-TW.resx. If a custom build step is used to generate a copy the resource assembly for MyStrings.zh-TW.resx to workaround the first issue, requesting a resource utilizing culture zh-TW will still return the resource from the default MyStrings.resx resources.
- Affected cultures are missing when running under Unix-based environments and calling CultureInfo.GetCultures
- Some culture data for affected cultures is platform inconsistent, notably the parent locale
Expected behavior
- Primarily, that whatever the “correct” behaviors for these locales are (as it relates to resources and CultureInfo) they be consistent between Windows and Unix-like environments.
- Secondarily, that aliased locales would “just work” in .NET Core for resource localization - e.g., that we could have resources defined as zh-TW, just as we can on Windows today, and properly retrieve the expected resources.
Brief Analysis
Most of the above symptoms boil down to uloc_getAvailable in ICU’s C API.
For example, the zh-TW resource files do not get copied during build, because during the task SplitResourcesByCulture, the culture is validated against a cache based ultimately on CultureInfo.GetCultures, which in turn, on Unix, ultimately relies on ICU. A diagnostic MSBuild log shows why the file is missing:
Removed Item(s):
_MixedResourceWithNoCulture=
Resources/MyNetCoreProject.MyResources.zh-TW.resx
OriginalItemSpec=Resources/MyNetCoreProject.MyResources.zh-TW.resx
TargetPath=Resources/MyNetCoreProject.MyResources.zh-TW.resx
WithCulture=false
From which we can follow back to the offending path:
Microsoft/msbuild/src/Tasks/Microsoft.Common.CurrentVersion.targets - SplitResourceByCulture -> Microsoft.Build.Tasks.AssignCulture.Execute -> Microsoft.Build.Tasks.Culture.GetItemCultureInfo -> Microsoft.Build.Tasks.CultureInfoCache.IsValidCultureString -> Microsoft.Build.Shared.AssemblyUtilities.GetAllCultures -> CultureInfo.GetCultures -> CultureData.GetCultures -> CultureData(Unix).EnumCultures -> System.Globalization.Native/locale.cpp:GlobalizationNative_GetLocales https://github.com/dotnet/coreclr/blob/8ba838fb54d6c07271d026b2d77bedcb9e2a786a/src/corefx/System.Globalization.Native/locale.cpp#L162-L171
ICU does not return aliases when getting a list of locales – whether with uloc_getAvailable or Locale::getAvailableLocales (and uloc_countAvailable does not include them in its count).
That ICU does not return the aliases in this manner appears to be intentional, both based on the numerous references to a lack of alias mapping in the uloc documentation, and the following bug:
https://unicode-org.atlassian.net/browse/ICU-4309
uloc_getAvailable returns sr_YU, even though it is an %%ALIAS locale. None of the other %%ALIAS locales are returned.
TracBot made changes - 01/Jul/18 1:59 PM
Resolution Fixed [ 10004 ]
Status Done [ 10002 ] Done [ 10002 ]
ICU-4309 was fixed via: https://github.com/unicode-org/icu/commit/ab68bb319601bc467784dcbdcc6d52131a2863d2 Which seems to further indicate that ICU not returning aliases when calling uloc_getAvailable is intentional.
In-Depth Analysis
A full analysis can be seen in the test repo README: CodingDinosaur/CultureIssueDemonstration
Test Repos
I have two test repos that help demonstrate this issue:
CultureIssueDemonstration
- https://github.com/CodingDinosaur/CultureIssueDemonstration
- Demonstrates the symptoms described at both build-time and run-time
- Running the provided test scripts will allow for running the test code under your current platform, or on Linux via Docker. Thus is is recommended to run under Windows with Docker installed to compare the results under both Windows and Linux.
- Contains a README file that goes into more detail on the issue and the apparent cause
CultureIcuTest
- https://github.com/CodingDinosaur/CultureIcuTest
- Demonstrates the results of using the method utilized in mscorlib to get all locales along with similar methods in ICU directly
Copied from original issue: dotnet/coreclr#20388
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 13
- Comments: 62 (38 by maintainers)
Commits related to this issue
- Add symlinkc for chinese localization Attempt to solve https://github.com/microsoft/msbuild/issues/3897 — committed to JustArchiNET/ArchiSteamFarm by JustArchi 5 years ago
- Add support for aliased cultures Fixes #3897 — committed to 0xced/msbuild by 0xced 3 years ago
- Add support for aliased cultures Fixes #3897 — committed to 0xced/msbuild by 0xced 3 years ago
- Use the neutral cultures zh-Hans and zh-Hant Problem: Refer to https://github.com/dotnet/msbuild/issues/3897 When building with dotnet cli, certain cultures cannot be utilized for resource localizati... — committed to botcher/ModernWpf by botcher 2 years ago
Is there any update to this issue, or at least a workaround?
My understanding is, that it’s currently not possible to have an ASP.NET Core application localized with
zh-cultures on Linux, which seems like a pretty common use case.Any updates? There have different behaviors between msbuild and dotnet cli on windows, it bothers me a lot.
Any progress?
me too.
There should be a new daily build available in a few days but the next official release (RC1) won’t be available until September.
In your case, it is not recommended. This one will have the same effect as if you set the property in the csproj.
One last recommendation to @Bartleby2718 is try not to set
System.Globalization.UseNlsin csproj and instead use the environment variable on the Windows build machine. The reason is when settingSystem.Globalization.UseNlsin the csproj, will force the app to run using NLS which I don’t think you need to do that. We are trying to work around the resource build issue only here and not changing the app behavior.@tarekgh Thanks for pointing that out! I have updated my comment above accordingly.
It appears that the changes I made to environment variable were not applied for some reason. Properly enable the NLS mode answered two of my questions, but I’m still curious about the other two. I’m especially curious about how long .NET will support NLS mode through runtime configuration. Will it survive in .NET 7?
Reading these two comments suggests the NLS mode is not enabled correctly. Could you add the following line in your code and send the output?
Note: I have updated some of these after @tarekgh let me know (and I confirmed) that the NLS mode is not enabled.
@tarekgh I’ve tried setting
DOTNET_SYSTEM_GLOBALIZATION_USENLS=1, but the results don’t exactly match what I expected. @BenVillalobos Could you please help me understand the results? I have a list of questions at the bottom.Command Ran on Git Bash - Linux Container version
Command Ran on Git Bash - Windows version
Specs for the Windows Machine Used
Setup
I tried 4 different cases. In each case, I:
System.Globalization.UseNlsappropriately in the relevantcsprojfile.CultureInfo.GetCultures(CultureTypes.AllCultures)to a file.Results
System.Globalization.UseNlstrue~813~ 857 cultures printed
783 cultures printed
System.Globalization.UseNlsfalse813 cultures printed
783 cultures printed
Miscellaneous Info
<TargetFramework>net6.0</TargetFramework>inMy.Project.csproj.Questions:
OS buildshould be below18362.116, per this page.) This means that my app would use ICU globalization APIs by default, so settingDOTNET_SYSTEM_GLOBALIZATION_USENLS=1should have resulted in a difference inCultureInfo.GetCultures(CultureTypes.AllCultures). However, I got the same set of cultures whether I ran the app on the Windows host or inside the Linux container. What’s the reason behind this?System.Globalization.UseNlsin the relevant.cpsrojfile and observed that a)CultureInfo.GetCultures(CultureTypes.AllCultures)had more cultures, includingzh-CN. This behavior now makes sense to me.DOTNET_SYSTEM_GLOBALIZATION_USENLS(or equivalent solutions like usingruntimeconfig.json) can be a viable solution for now. Is this setting going to be around for a long time?zh-CN. However, I expectedzh-CNto show up when using NLS. Why is this?My.Projectdoes havezh-CNresources, but how ismsbuildable to create thezh-CNdirectory inbin/Debug(orbin/Release) even whenzh-CNis not inCultureInfo.GetCultures(CultureTypes.AllCultures)?Thank you very much in advance, and I look forward to hearing back from you.
Q&A time 🤔
@madelson
Definitely during build time. Can you describe your runtime scenario? Does this just mean “calling our API for valid cultures?” If so, it applies to both.
If we add this workaround as “if it’s not seen in the culture API, use our hardcoded list as a backup,” I expect windows/non-windows to behave the same. The only situation to be worried about would be some culture alias not existing in the hardcoded list, so it would still fail on unix. We’d need to handle those on a case by case basis.
I believe it’ll work as long as the env var is set by the time our API gets called. Though consider that
ValidCultureNamesis loaded into a static hashset withinCultureInfoCache.cc @marcpopMSFT . The situation that concerns me is getting a flood of “we need this alias to be supported” in the long term, which isn’t very maintainable.
@marcpopMSFT
@Bartleby2718
Starting up a PR for it as soon as this comment gets posted.
It’s a new codepath within our binaries, so It’d need an upgrade to net7 unless we backport.
Makes sense. I’ll try that as I wait for the response to other questions in this thread.
@tarekgh whoops! Thanks @CodingDinosaur ❤️
Just clarify this is written by @CodingDinosaur 😃
investigation notes:
https://github.com/CodingDinosaur/CultureIssueDemonstration#apparent-root-cause is a fantastic writeup, thanks @codingdinosaur!
Off the top of my head, it sounds like a valid workaround would be to allow an opt-in (environment variable?), and follow the older hardcoded culture code path. In this case, something like
The Quick Workaround
FEATURE_CULTUREINFO_GETCULTURESflag from https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Tasks/CultureInfoCache.cs#L62-L68Traits.csrather than a straight up env var check): https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Tasks/CultureInfoCache.cs#L26-L32Better yet: it should check the hardcoded list as a backup if the culture isn’t seen during
GetAllCultures. This would help us avoid needing to update the list with all supported cultures over time.Will bring this up with the team. @madelson, would that be an acceptable workaround for you? It would require setting an environment variable on unix machines.
Other Notes
Right now the codepaths for the hardcodedculture names are behind
!FEATURE_CULTUREINFO_GETCULTURES.FEATURE_CULTUREINFO_GETCULTURESis only defined when the targetframework starts withnet4ornet3.I’m confused as to why we reflect over the CultureInfo type to call GetCultures in non-net472/net35 scenarios. I’m not sure what this accomplishes? https://github.com/dotnet/msbuild/issues/2349#issuecomment-318161879 suggests that it’s to tap into whatever version of net core we’re on.
https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Framework/AssemblyUtilities.cs#L117-L142
The more I dig into this flag, the more I think it can be removed entirely, but I’d like @rainersigwald’s take on that. It looks like the motivation for creating that flag was because early net standard didn’t support that API, but net472 did. The API’s have clearly caught up at this point.
We hear you, There’s no current timeline for it, but in the coming days, we plan to spend some time designing a solution we feel comfortable opening up to a community contribution.
This issue is open to fix this issue. When this issue is resolved, you’ll be able to do so at that time.
No, these names are not standard names. The standard names are
zh-Hans-CNandzh-Hant-TW. You still can create the old name cultures asCultureInfo.GetCultureInfo("zh-TW").It shouldn’t matter.
In general, I recommend you create the resources with the cultures
zh-Hansandzh-Hantinstead. This will be allowed to do so from now, (I mean you don’t have to wait for this issue to get resolved). And the resources will work on Windows and Linux. In addition, these resources will work just fine even if you set the UI culture tozh-CNorzh-TWStatus update: https://github.com/dotnet/msbuild/pull/6148 is awaiting review/merge 🤞
@chaoyebugao Yes! This is currently blocked by https://github.com/dotnet/msbuild/pull/6148, which is being worked on.
Yeah, this issue is blocked on upgrading our projects from netstandard2.0. Though the PR is out of date, I’ve been using a local branch and making decent progress. Once that’s covered, I’ll jump on this.