Java internationalization (i18n) with proper plura

2019-03-08 07:54发布

问题:

I was going to use Java's standard i18n system with the ChoiceFormat class for plurals, but then realized that it doesn't handle the complex plural rules of some languages (e.g. Polish). If it only handles languages that resemble English, then it seems a little pointless.

What options are there to achieve correct plural forms? What are the pros and cons of using them?

回答1:

Well, you already tagged the question correctly, so I assume you know thing or two about ICU.

With ICU you have two choices for proper handling of plural forms:

  • PluralRules, which gives you the rules for given Locale
  • PluralFormat, which uses aforementioned rules to allow formatting

Which one to use? Personally, I prefer to use PluralRules directly, to select appropriate message from the resource bundles.

ULocale uLocale = ULocale.forLanguageTag("pl-PL");
ResourceBundle resources = ResourceBundle.getBundle( "path.to.messages",
                               uLocale.toLocale());
PluralRules pluralRules = PluralRules.forLocale(uLocale);

double[] numbers = { 0, 1, 1.5, 2, 2.5, 3, 4, 5, 5.5, 11, 12, 23 };
for (double number : numbers) { 
  String resourceKey = "some.message.plural_form." + pluralRules.select(number);
  String message = "!" + resourceKey + "!";
  try {
    message = resources.getString(resourceKey);
    System.out.println(format(message, uLocale, number));
   } catch (MissingResourceException e) { // Log this } 
}

Of course you (or the translator) would need to add the proper forms to properties file, in this example let's say:

some.message.plural_form.one=Znaleziono {0} plik
some.message.plural_form.few=Znaleziono {0} pliki
some.message.plural_form.many=Znaleziono {0} plików
some.message.plural_form.other=Znaleziono {0} pliku

For other languages (i.e. Arabic) you might also need to use "zero" and "two" keywords, see CLDR's language plural rules for details.

Alternatively you can use PluralFormat to select valid form. Usual examples show direct instantiation, which totally doesn't make sense in my opinion. It is easier to use it with ICU's MessageFormat:

String pattern = "Znaleziono {0,plural,one{# plik}" +
                 "few{# pliki}" +
                 "many{# plików}" +
                 "other{# pliku}}";
MessageFormat fmt = new MessageFormat(pattern, ULocale.forLanguageTag("pl-PL"));
StringBuffer result = new StringBuffer();
FieldPosition zero = new FieldPosition(0);
double[] theNumber = { number };
fmt.format(theNumber, result, zero);

Of course, realistically you would not hardcode th pattern string, but place something like this in the properties file:

some.message.pattern=Found {0,plural,one{# file}other{# files}}

The only problem with this approach is, the translator must be aware of the placeholder format. Another issue, which I tried to show in the code above is, MessageFormat's static format() method (the one that is easy to use) always formats for the default Locale. This might be a real problem in web applications, where the default Locale typically means the server's one. Thus I had to format for a specific Locale (floating point numbers, mind you) and the code looks rather ugly...

I still prefer the PluralRules approach, which to me is much cleaner (although it needs to use the same message formatting style, only wrapped with helper method).



回答2:

ChoiceFormat, as explained here seems flexible enough to deal with any sort of pluralization you might throw at it.

EDIT: as Dr.Haribo pointed out in his comment, ChoiceFormat is not sufficient for Polish pluralization. But a followup from the same blog suggests ICU4J that handles more complex pluralization rules