globalize: CLDR data is reported as missing when locale is set with a region

When I try to use Globalize with a language-region combo (like en-US,) it always gives me the error: Error: E_MISSING_CLDR: Missing required CLDR content 'main/en/dates/calendars/gregorian/dateTimeFormats/short'.

The basic code is as follows:

Globalize.load(
{
  "main": {
    "en-US": {
    ... // content omitted
  }
}
});
Globalize.locale('en-US');
var formatter = Globalize.dateFormatter({datetime: 'short'});

At that point, globalize always throws the error. However, if I provide it with CLDR data that is not region-dependent (like just plain ‘en’) and set my locale to the same, the code works. In both cases, it is looking for cldr data that is just under the language code (ignoring the region for the cldr path.) I have tested de-DE, and pt-BR and see the same results in those languages: it only looks for the language cldr data, ignoring the language-REGION data that was provided.

I saw an issue on cldrjs that implied that the issue could be solved by using just en instead of en-US (which I am doing as a stopgap) but this same issue occurs for any region specific locale, and just using the language won’t always be the right thing to do.

I’m not sure if this issue is better here or on cldrjs. Let me know if I need to move it, or provide more information.

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 1
  • Comments: 27 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @arbrown, your statement is not precisely correct. Although, I agree such behavior goes against common sense.

Globalize is clever enough to figure out the succinct form of any locale, called languageId in the table below.

locale languageId maxLanguageId language script region
en en en-Latn-US en Latn US
en-US en en-Latn-US en Latn US
de de de-Latn-DE de Latn DE
zh zh zh-Hans-CN zh Hans CN
zh-TW zh-TW zh-Hant-TW zh Hant TW
ar ar ar-Arab-EG ar Arab EG
pt pt pt-Latn-BR pt Latn BR
pt-BR pt pt-Latn-BR pt Latn BR
pt-PT pt-PT pt-Latn-PT pt Latn PT
es es es-Latn-ES es Latn ES
es-AR es-AR es-Latn-AR es Latn AR

You should load CLDR JSON files using locale in its succint form. Therefore, you should load en when you want either en, en-US, en-Latn, en-Latn-US (they are all the same). But, you should load en-GB when you want English as spoken in England, or en-IN when you want English as spoken in India.

You can figure out what the succint form is via [1] or [2].

  1. By looking at the error message. https://gist.github.com/rxaviers/aeb955c22c51d9c172a7
  2. By looking at a Globalize variable. https://gist.github.com/rxaviers/bb143a6715d1392ecc96

Little more details

Globalize (via cldrjs) figures out the correct languageId, which is used traversing CLDR paths. As explained in the issue you’ve referenced to (https://github.com/rxaviers/cldrjs/issues/17#issuecomment-48643041), it tries to lookup the path using main/{languageId}/... and languageId is always in the succint form, obtained by removing the likely subtags from maxLanguageId according to the specification.

To-Do

Improving documentation for clarity is an obvious step that could be taken.

  • Improve documentation for clarity.

I’m open to hear if you have any other suggestion.

This will be a huge stumbling block for users. While docs will help, I don’t think it’s the right solution.

It seems we could take two different approaches.

  1. Check the provided locale first, and if not found, then check the succinct form.
  2. Make Globalize.load() smarter so that it will normalize the data for you. For example, if you load en-US and there is no en locale defined, then store the data in en instead.

Option 1 seems faster and safer. What are your thoughts about that @rxaviers?