Quantcast
Channel: Generated Content by David Storey
Viewing all articles
Browse latest Browse all 65

Internationalization API tips and tricks

$
0
0

In my previous post, I introduced the Internationalization API. In this follow up I will show some tips and tricks for using the API and working around issues I found will writing that post.

Setting the locale based on the document language

In the previous blog post, I mostly set the locale by assigning it to a variable at the top of the JS file. Outside of small demos, it would be best not to hardcode the locale.

One approach would be to set the lang attribute (or xml:lang for XML documents, such as SVG), and then use this to set the locale.

var root = document.documentElement,
    lang = root.lang || root.getAttributeNS("http://www.w3.org/XML/1998/namespace", "lang") || "en",
    day = new Intl.DateTimeFormat(lang, { 
                    weekday: "long" 
                });

Set the locale based on the document language

This takes the language tag from the HTML lang attribute if it exists, otherwise takes it from the xml:lang attribute, and if all else fails, use a fallback language.

This assumes you are formatting something in the same language as the actual document. You may want to check for a language tag set in an lang attribute closer to where you will be inserting the result if you have a multi-lingual document. You’ll also probably want to listen for changes to the attribute value if you have an app where the language can be changed.

Use the API even for monolingual apps and sites

While the raison d’être of a internationalisation API is for supporting multiple locales, do not dismiss it if you are writing a monolingual app.

For one, you can use methods such as toLocaleString using the language of the page in question, rather than that of the browser, which is the default when no locale is specified.

The API also gives you much more control over formatting and collation options, such as varying the strictness of matching strings, setting minimum significant digits, displaying the month in its long, short or narrow form, and so on.

Use smart fallbacks

Not all languages and regions are supported by all browsers, but it is possible to supply a list of locales, or fix up flaky support.

Unsupported languages

If you need to use a language that is not supported by a browser, you can use a fallback by providing an array. Find languages that use the same customs as the language that isn’t supported.

For example, no (Norwegian macrolanguage) is not supported by Chrome or Firefox, but you can give a fallback of Norwegian Bokmål, Norwegian Nynork, and if all else fails, Danish (which uses the same sort order) if you want to collate according to Norwegian customs:

new Intl.Collator(["no", "nb", "nn", "da"]);

Be careful though; just because a locale behaves the same as another for one type of data, doesn’t mean it does for all. Testing is key here. You can check the CLDR data (or by type), but browser implementations don’t always agree.

Unsupported region

While there are not too many major languages that are missing, Chrome does not support many regional variations, outside of British and American English; Portuguese and Brazilian Portuguese; and multiple Spanish variations. In these cases the browser will likely fall back to the base language. This is a problem in English for dates, as the US uses an almost unique date format. While countries such as Switzerland use a specific format for numbers common to all languages in that country, rather than the format of the country where the base language is from.

Don’t be afraid to use a different locale instead of the one you need, if it is not supported, and the base language doesn’t give the correct formatting. While Chrome does not make it easy to detect if the right locale is being used, due to it not normalising the locale to the used form when testing for support, it is possible by checking the locale property after you have created a new instance:

var locale = "en-AU",
    country = locale.split("-")[1],
    usedLocale = null,
    formatter = null;

// Gah, Chrome says supported but it is not
console.log(Intl.DateTimeFormat.supportedLocalesOf(locale));

formatter = new Intl.DateTimeFormat(locale);
usedLocale = formatter.resolvedOptions().locale;

// Set to British English if the country is not included and lang is English
if (usedLocale.indexOf("en") === 0 && usedLocale.indexOf("-" + country) === -1) {
    formatter = new Intl.DateTimeFormat("en-GB");
}

Use an alternative fallback if only the base language is used

This wont work exactly as I wrote it if a script tag or Unicode extension is included in the locale, but that can be made smarter, or you can just hardcode the locale. Be careful for Canadian too, as they sometimes use the US format.

Don’t rely too much on default date formatting

Avoid numerical month formatting

As can be seen in the previous example, displaying the month as a number can be troublesome if the date is formatted in a different locale than they are expecting. Some people suggest using the ISO date format, but outside of people from countries that use such format, or geeks like us who know this format, it too could be confusing; what is to stop people used to US formatting assuming the date is formatted backwards for some reason, and that the month is in the last position?

A much clearer approach is just to display the month alphabetically. As we have locale information at our disposal, the drawback of needing to translate the month is no longer an issue:

var locale = "en-AU",
    formatter = new Intl.DateTimeFormat(locale, {
        day: "2-digit",
        month: "short",
        year: "numeric"
    });

// Chrome: “Nov 08, 2013”, IE11 “08 Nov 2013”
console.log(formatter.format(new Date("2013-11-09")));

Use alphabetical month

If you do not combine this with the previous tip to fix the formatting order, you will still not get a perfectly formatted date in many English speaking countries, but at least there will be no confusion.

A drawback that does exist is that as the month is not numerical, it may not be understandable if you’re using a language tag that is not supported (region/country sub tags are fine, as it will fallback to the correct language). This can be mitigated by using smart fallback languages (as covered previously) or only using alphabetical months for languages you know are supported.

Consider specifying the calendar when using Arabic

A number of different calendars can be used with Arabic. While Chrome uses the Greogrian calendar for its only ar locale, and the preview build of Firefox does the same for all Arabic locales, IE11 uses the Islamic calendar for both Arabic/Saudi Arabia and base Arabic. If you are just providing one Arabic translation, or IE falls back due to unsupported arabic locales (IE doesn’t support localisations for sub-Saharan African Arabic locales), the user may not get the date they expect. The inconsistancy with Saudi Arabian Arabic between browsers also isn‘t ideal.

Instead you can provide the calendar in the locale when using Arabic. In the follow example, IE will fallback to the arlocale, but will use the Gregorian calenedar rather than the default Islamic:

//  ٨‏/١١‏/٢٠١٣
var formatter = new Intl.DateTimeFormat("ar-SD-u-ca-gregory");
console.log(formatter.format(new Date("2013-11-09")));

Set calendar to Gregorian

If instead, you want to use the Islamic calendar in all browsers you can specify that with the islamic value:

// ٥‏/١‏/١٤٣٥"
var formatter = new Intl.DateTimeFormat("ar-SA-u-ca-islamic");
console.log(formatter.format(new Date("2013-11-09")));

Set calendar to Islamic

Correct the numbering system for certain Arabic locales in Chrome

While Arabic generally uses the Arab numbering system, Latin digits are used in Morocco (ma), Algeria (dz), and Tunisia (tn). As Blink only supports the base ar locale, it will format numbers and dates incorrectly for these locales. This can be fixed by specifying the numbering system in the locale for those countries:

var formatter = new Intl.NumberFormat("ar-DZ-u-nu-latn");

// "123.456.789,34" rather than "١٢٣٬٤٥٦٬٧٨٩٫٣٤"
console.log(formatter.format(123456789.34));

Specify Latin digits with the locale

Fix incorrect symbols and spaces

The preview release of IE11 includes a number of issues that I’ve found with formatting that are sub-optimal. Although these can be fixed, it is worth noting that IE11 has not been released yet, and they could be fixed before the final release.

Arabic commas

IE11 includes a regular comma (,) when formatting Arabic numbers. It should use the Arabic comma (٬) instead. This can be fixed with a simple find and replace.

In the following example, I test to see if either the used locale is ar (Arabic) or starts with ar- (Arabic plus some additional locale info, such as a country tag), and that a comma exists in the formatted number. If a different locale is used, or the problem doesn’t exist, the fix will not be applied. Note that you can not just check that the locale starts with ar as that may match 3 letter language codes, such as arn.

var locale = "ar",
    formatter = new Intl.NumberFormat(locale),
    usedLocale = formatter.resolvedOptions().locale,
    formattedNum = formatter.format(100000000000.100);

if ((usedLocale === "ar" || usedLocale.indexOf("ar-") === 0) && formattedNum.indexOf(",") !== -1) {
    formattedNum = formattedNum.replace(/,/g, "٬");
}

console.log("Fixed: " + formattedNum);

Replace comma with Arabic comma

Arabic percentage sign

Similarly, IE11 has an issue where it uses a regular percentage sign (%), rather than an Arabic percentage sign (٪). This can be fixed in the same way. I’ve changed the previous code to check for both:

if ((usedLocale === "ar" || usedLocale.indexOf("ar-") === 0)) {
    if (formattedNum.indexOf(",") !== -1) {
        formattedNum = formattedNum.replace(/,/g, "٬");
    }

    // assumes only one percentage sign in the string
    if (formattedNum.indexOf("%") !== -1) {
        formattedNum = formattedNum.replace("%", "٪");
    }
}

Replace percentage sign with Arabic percentage sign

Of course, in a real app you’d probably want to put this in a function. You may also want to check to make sure that the style is set to percent and useGrouping is truthy, but I didn’t bother, as I figured these characters wouldn’t be included otherwise.

Non-breaking spaces

When formatted numbers have spaces, such as the decimal separator in the French locale; or when a unit is included, such as a percentage sign or currency symbol, it is important to use a non-breaking space. This keeps the whole number and any unit together when a line wraps. Allowing them to break onto separate lines can cause confusion, and generally looks unprofessional.

Fortunately, all browsers that implement the Internationalization API include non-breaking spaces for the grouping separator, but IE11 includes a regular space before (or after) the percentage sign (if a space is used at all) or the currency unit. This can also be fixed with a simple find and replace:

if (formattedNum.indexOf(" ") !== -1) {
    formattedNum = formattedNum.replace(/ /g, " ");
}

Replace regular spaces with no-break spaces

The one place where this falls down is when using currency names, as they often have spaces (e.g. “Swiss Franc”). You probably want to guard against this by doing some extra checking.


Viewing all articles
Browse latest Browse all 65

Trending Articles