msbuild: Cultures aliased by ICU cannot be used for resource localization on non-Windows environments

From @CodingDinosaur on October 12, 2018 3:30

When building or running under .NET Core on a Unix-based environment, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. These cultures can be used as expected when building or running under Windows.

The affected cultures are those which are aliased by ICU – that is, to save on DB space for certain cases, ICU defines some locales as an “alias” of another. There are 42 locale aliases in ICU 57, of those two of the most common are zh-TW and zh-CN. For a full list of affected locales, see: ICU Aliased Locale List - CultureIssueDemonstration Readme.

This platform-inconsistent behavior when trying to localize certain resources, necessitates using special code and workarounds both at build and deploy time when developing cross-platform applications.

See a demo of this issue in CodingDinosaur/CultureIssueDemonstration

Symptoms

  • Resource files for the affected locales will not be generated into resource assemblies if building in a Unix-based environment.
    • For example: MyStrings.zh-TW.resx will not have a resource assembly created if the build occurs on a Unix-based environment, but will if built under Windows
  • Resources for the affected locales will not be utilized even if the resource files are present, falling back to the default resources.
    • For example: Consider MyStrings.resx and MyStrings.zh-TW.resx. If a custom build step is used to generate a copy the resource assembly for MyStrings.zh-TW.resx to workaround the first issue, requesting a resource utilizing culture zh-TW will still return the resource from the default MyStrings.resx resources.
  • Affected cultures are missing when running under Unix-based environments and calling CultureInfo.GetCultures
  • Some culture data for affected cultures is platform inconsistent, notably the parent locale

Expected behavior

  • Primarily, that whatever the “correct” behaviors for these locales are (as it relates to resources and CultureInfo) they be consistent between Windows and Unix-like environments.
  • Secondarily, that aliased locales would “just work” in .NET Core for resource localization - e.g., that we could have resources defined as zh-TW, just as we can on Windows today, and properly retrieve the expected resources.

Brief Analysis

Most of the above symptoms boil down to uloc_getAvailable in ICU’s C API.

For example, the zh-TW resource files do not get copied during build, because during the task SplitResourcesByCulture, the culture is validated against a cache based ultimately on CultureInfo.GetCultures, which in turn, on Unix, ultimately relies on ICU. A diagnostic MSBuild log shows why the file is missing:

Removed Item(s): 
  _MixedResourceWithNoCulture=
    Resources/MyNetCoreProject.MyResources.zh-TW.resx
        OriginalItemSpec=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        TargetPath=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        WithCulture=false

From which we can follow back to the offending path:

Microsoft/msbuild/src/Tasks/Microsoft.Common.CurrentVersion.targets - SplitResourceByCulture -> Microsoft.Build.Tasks.AssignCulture.Execute -> Microsoft.Build.Tasks.Culture.GetItemCultureInfo -> Microsoft.Build.Tasks.CultureInfoCache.IsValidCultureString -> Microsoft.Build.Shared.AssemblyUtilities.GetAllCultures -> CultureInfo.GetCultures -> CultureData.GetCultures -> CultureData(Unix).EnumCultures -> System.Globalization.Native/locale.cpp:GlobalizationNative_GetLocales https://github.com/dotnet/coreclr/blob/8ba838fb54d6c07271d026b2d77bedcb9e2a786a/src/corefx/System.Globalization.Native/locale.cpp#L162-L171

ICU does not return aliases when getting a list of locales – whether with uloc_getAvailable or Locale::getAvailableLocales (and uloc_countAvailable does not include them in its count).

That ICU does not return the aliases in this manner appears to be intentional, both based on the numerous references to a lack of alias mapping in the uloc documentation, and the following bug:

https://unicode-org.atlassian.net/browse/ICU-4309

uloc_getAvailable returns sr_YU, even though it is an %%ALIAS locale. None of the other %%ALIAS locales are returned.

TracBot made changes - 01/Jul/18 1:59 PM
Resolution Fixed [ 10004 ]
Status Done [ 10002 ] Done [ 10002 ]

ICU-4309 was fixed via: https://github.com/unicode-org/icu/commit/ab68bb319601bc467784dcbdcc6d52131a2863d2 Which seems to further indicate that ICU not returning aliases when calling uloc_getAvailable is intentional.

In-Depth Analysis

A full analysis can be seen in the test repo README: CodingDinosaur/CultureIssueDemonstration

Test Repos

I have two test repos that help demonstrate this issue:

CultureIssueDemonstration

  • https://github.com/CodingDinosaur/CultureIssueDemonstration
  • Demonstrates the symptoms described at both build-time and run-time
  • Running the provided test scripts will allow for running the test code under your current platform, or on Linux via Docker. Thus is is recommended to run under Windows with Docker installed to compare the results under both Windows and Linux.
  • Contains a README file that goes into more detail on the issue and the apparent cause

CultureIcuTest

Copied from original issue: dotnet/coreclr#20388

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 13
  • Comments: 62 (38 by maintainers)

Commits related to this issue

Most upvoted comments

Is there any update to this issue, or at least a workaround?

My understanding is, that it’s currently not possible to have an ASP.NET Core application localized with zh- cultures on Linux, which seems like a pretty common use case.

Any updates? There have different behaviors between msbuild and dotnet cli on windows, it bothers me a lot.

Any progress?

Any updates? There have different behaviors between msbuild and dotnet cli on windows, it bothers me a lot.

me too.

There should be a new daily build available in a few days but the next official release (RC1) won’t be available until September.

I believe there was one more option besides csproj and environment variable: runtimeconfig.json. Is it recommended or not?

In your case, it is not recommended. This one will have the same effect as if you set the property in the csproj.

One last recommendation to @Bartleby2718 is try not to set System.Globalization.UseNls in csproj and instead use the environment variable on the Windows build machine. The reason is when setting System.Globalization.UseNls in the csproj, will force the app to run using NLS which I don’t think you need to do that. We are trying to work around the resource build issue only here and not changing the app behavior.

@tarekgh Thanks for pointing that out! I have updated my comment above accordingly.

It appears that the changes I made to environment variable were not applied for some reason. Properly enable the NLS mode answered two of my questions, but I’m still curious about the other two. I’m especially curious about how long .NET will support NLS mode through runtime configuration. Will it survive in .NET 7?

so setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 should have resulted in a difference in CultureInfo.GetCultures(CultureTypes.AllCultures). However, I got the same set of cultures whether I ran the app on the Windows host or inside the Linux container. Around 800 cultures were printed in all cases, but none of them included zh-CN. However, I expected zh-CN to show up when using NLS. Why is this?

Reading these two comments suggests the NLS mode is not enabled correctly. Could you add the following line in your code and send the output?

                Console.WriteLine($".... UseNls:                 {typeof(object).Assembly.GetType("System.Globalization.GlobalizationMode")!.GetProperty("UseNls", BindingFlags.Static | BindingFlags.NonPublic)!.GetValue(null)} ....");

Note: I have updated some of these after @tarekgh let me know (and I confirmed) that the NLS mode is not enabled.

@tarekgh I’ve tried setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1, but the results don’t exactly match what I expected. @BenVillalobos Could you please help me understand the results? I have a list of questions at the bottom.

Command Ran on Git Bash - Linux Container version

docker commands to do some cleanup for reproducibility/idempotency; \
cd to the solution root && \
git clean -dfx; \
dotnet publish My.Project -c Release --self-contained -r linux-x64 -o bin -property:SolutionDir=$(pwd) && \
docker build -f My.Dockerfile -t myImage:1 . && \
docker run --mount type=bind,src=$(pwd),dst=/App -d -p 127.0.0.1:6767:80/tcp myImage:1 && \
start chrome "http://localhost:myPort/page-containing-chinese-strings"

Command Ran on Git Bash - Windows version

git clean -dfx; \
our custom command that basically runs msbuild against the solution && \
start chrome "http://localhost:myPort/page-containing-chinese-strings"

Specs for the Windows Machine Used

Key Value
Edition Windows 10 Enterprise
Version 21H2
Installed on ‎4/‎15/‎2022
OS build 19044.1766
Experience Windows Feature Experience Pack 120.2212.4180.0

Setup

I tried 4 different cases. In each case, I:

  • made sure to set/unset environment variables properly by ~editing environment variables on Windows, restarting the git bash terminal, and setting/unsetting in git bash, to be extra careful~ setting System.Globalization.UseNls appropriately in the relevant csproj file.
  • printed every single culture in CultureInfo.GetCultures(CultureTypes.AllCultures) to a file.

Results

Windows version Linux version
~Env var set~ System.Globalization.UseNls true Chinese strings showed up
~813~ 857 cultures printed
Chinese strings showed up
783 cultures printed
~Env var unset~ System.Globalization.UseNls false Chinese strings showed up
813 cultures printed
Defaulted to English strings
783 cultures printed

Miscellaneous Info

  • I have <TargetFramework>net6.0</TargetFramework> in My.Project.csproj.
  • I ensured that routing was done correctly every time.

Questions:

  1. Based on the Windows specs, it appears to me that Windows 10 May 2019 Update must have been installed. (Otherwise, OS build should be below 18362.116, per this page.) This means that my app would use ICU globalization APIs by default, so setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 should have resulted in a difference in CultureInfo.GetCultures(CultureTypes.AllCultures). However, I got the same set of cultures whether I ran the app on the Windows host or inside the Linux container. What’s the reason behind this?
  • Edit: @tarekgh correctly pointed out that the environment variable was not being set. I confirmed that .NET picked up the environment variable only after I restarted the machine. (I’ve tried setting both user environment variable and system environment variable.) For faster iteration, I set System.Globalization.UseNls in the relevant .cpsroj file and observed that a) CultureInfo.GetCultures(CultureTypes.AllCultures) had more cultures, including zh-CN. This behavior now makes sense to me.
  1. Since we can build on Windows for the time being, I think DOTNET_SYSTEM_GLOBALIZATION_USENLS (or equivalent solutions like using runtimeconfig.json) can be a viable solution for now. Is this setting going to be around for a long time?
  2. Around 800 cultures were printed in all cases, but none of them included zh-CN. However, I expected zh-CN to show up when using NLS. Why is this?
  • Edit: As mentioned above, this is no longer the case, and it behaves as expected.
  1. My.Project does have zh-CN resources, but how is msbuild able to create the zh-CN directory in bin/Debug (or bin/Release) even when zh-CN is not in CultureInfo.GetCultures(CultureTypes.AllCultures)?

Thank you very much in advance, and I look forward to hearing back from you.

Q&A time 🤔

@madelson

Does this needs to be set only during runtime, only during the build, or both?

Definitely during build time. Can you describe your runtime scenario? Does this just mean “calling our API for valid cultures?” If so, it applies to both.

With this set, will things work exactly as they do on Windows regarding these cultures or will there still be some discrepancies to be aware of?

If we add this workaround as “if it’s not seen in the culture API, use our hardcoded list as a backup,” I expect windows/non-windows to behave the same. The only situation to be worried about would be some culture alias not existing in the hardcoded list, so it would still fail on unix. We’d need to handle those on a case by case basis.

Are there any timing implications for setting this variable? Is this the kind of variable that needs to be set before process start? If this gets set at app runtime, can it be set in Main()? Later? I know sometimes with such things you have to set the value before a bunch of stuff gets cached and locked in.

I believe it’ll work as long as the env var is set by the time our API gets called. Though consider that ValidCultureNames is loaded into a static hashset within CultureInfoCache.

Would this be supported long-term or only for a set number of releases?

cc @marcpopMSFT . The situation that concerns me is getting a flood of “we need this alias to be supported” in the long term, which isn’t very maintainable.

As far as timing, I assume this would go into .NET 7; any chance it would also be back-ported to .NET 6?

@marcpopMSFT

@Bartleby2718

Could you give me a rough ETA on this issue? Will there be a new release including this fix in, say, the next few weeks?

Starting up a PR for it as soon as this comment gets posted.

will it compatible with .NET 6 as well, or does it require an upgrade to .NET 7?

It’s a new codepath within our binaries, so It’d need an upgrade to net7 unless we backport.

Makes sense. I’ll try that as I wait for the response to other questions in this thread.

@tarekgh whoops! Thanks @CodingDinosaur ❤️

CodingDinosaur/CultureIssueDemonstration#apparent-root-cause is a fantastic writeup, thanks again Tarek.

Just clarify this is written by @CodingDinosaur 😃

investigation notes:

https://github.com/CodingDinosaur/CultureIssueDemonstration#apparent-root-cause is a fantastic writeup, thanks @codingdinosaur!

Off the top of my head, it sounds like a valid workaround would be to allow an opt-in (environment variable?), and follow the older hardcoded culture code path. In this case, something like

If (Environment.GetEnvironmentVariable("UseHardcodedCultureAliases") == 1)
{
	return HardcodedCultureNames;
}

The Quick Workaround

  1. Remove the FEATURE_CULTUREINFO_GETCULTURES flag from https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Tasks/CultureInfoCache.cs#L62-L68
  2. add the if statement above right around here (it should be a flag in Traits.cs rather than a straight up env var check): https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Tasks/CultureInfoCache.cs#L26-L32

Better yet: it should check the hardcoded list as a backup if the culture isn’t seen during GetAllCultures. This would help us avoid needing to update the list with all supported cultures over time.

Will bring this up with the team. @madelson, would that be an acceptable workaround for you? It would require setting an environment variable on unix machines.

Other Notes

Right now the codepaths for the hardcodedculture names are behind !FEATURE_CULTUREINFO_GETCULTURES. FEATURE_CULTUREINFO_GETCULTURES is only defined when the targetframework starts with net4 or net3.

I’m confused as to why we reflect over the CultureInfo type to call GetCultures in non-net472/net35 scenarios. I’m not sure what this accomplishes? https://github.com/dotnet/msbuild/issues/2349#issuecomment-318161879 suggests that it’s to tap into whatever version of net core we’re on.

https://github.com/dotnet/msbuild/blob/a44cc43931208ecdac42a1023ce79d7b2bd6956e/src/Framework/AssemblyUtilities.cs#L117-L142

The more I dig into this flag, the more I think it can be removed entirely, but I’d like @rainersigwald’s take on that. It looks like the motivation for creating that flag was because early net standard didn’t support that API, but net472 did. The API’s have clearly caught up at this point.

Is there a milestone or timeline for this? Is the issue suitable for a community contribution to accelerate?

We hear you, There’s no current timeline for it, but in the coming days, we plan to spend some time designing a solution we feel comfortable opening up to a community contribution.

As it stands, can we expect zh-CN/zh-TW resources to be compiled and loaded at runtime under Linux if the culture is set to CultureInfo.GetCultureInfo(“zh-TW”)?

This issue is open to fix this issue. When this issue is resolved, you’ll be able to do so at that time.

Can we expect CultureInfo.GetCultures(CultureTypes.AllCultures) start returning these cultures on Linux at some point or not?

No, these names are not standard names. The standard names are zh-Hans-CN and zh-Hant-TW. You still can create the old name cultures as CultureInfo.GetCultureInfo("zh-TW").

Does it matter whether we build on Windows or Linux or only where we run? Should it matter?

It shouldn’t matter.

In general, I recommend you create the resources with the cultures zh-Hans and zh-Hant instead. This will be allowed to do so from now, (I mean you don’t have to wait for this issue to get resolved). And the resources will work on Windows and Linux. In addition, these resources will work just fine even if you set the UI culture to zh-CN or zh-TW

Status update: https://github.com/dotnet/msbuild/pull/6148 is awaiting review/merge 🤞

@chaoyebugao Yes! This is currently blocked by https://github.com/dotnet/msbuild/pull/6148, which is being worked on.

Yeah, this issue is blocked on upgrading our projects from netstandard2.0. Though the PR is out of date, I’ve been using a local branch and making decent progress. Once that’s covered, I’ll jump on this.