How can I determine what the alphabet for a locale

2019-01-23 19:58发布

I would like to determine what the alphabet for a given locale is, preferably based on the browser Accept-Language header values. Anyone know how to do this, using a library if necessary ?

5条回答
Ridiculous、
2楼-- · 2019-01-23 20:14

take a look at [LocaleData.getExemplarSet][1]

for example for english this returns abcdefghijklmnopqrstuvwxyz

[1]: http://icu-project.org/apiref/icu4j/com/ibm/icu/util/LocaleData.html#getExemplarSet(com.ibm.icu.util.ULocale, int)

查看更多
淡お忘
3楼-- · 2019-01-23 20:22

If you just want to know the name of an appropriate character set for a users locale then you might try the nio.CharSet class.

If you really want to use the Accept-Language header, then there's an old O'Reilly article on this matter which introduces a pretty handy class called LanguageNegotiator.

I think one of those will give you a decent enough start.

查看更多
趁早两清
4楼-- · 2019-01-23 20:27

This is an English answer written in Århus. Yesterday, I heard some Germans say 'Blödheit, à propos, ist dumm'. However, one of them wore a shirt that said 'I know the difference between 文字 and الْعَرَبيّة'.

What's the answer to your question for this text? Is it allowed? Isn't this an English text?

查看更多
别忘想泡老子
5楼-- · 2019-01-23 20:30

The International Components for Unicode might help here. Specifically the UScript class looks promising.

Out of curiosity: What do you need it for?

查看更多
来,给爷笑一个
6楼-- · 2019-01-23 20:32

It depends on how specific you want to get. One place to look would be at the "Suppress-Script" properties in the IANA language registry.

Some languages have multiple "alphabets" that can be used for writing. For example, Azerbaijani can be written in Latin or Arabic script. Most languages, like English, are written almost exclusively in a single script, so the correct script goes without saying, and should be "suppressed" in language codes.

So, looking at the entry for Russian, you can tell that the preferred script is Cyrillic, while for Ethiopian, it is Amharic. But German, Norwegian, and English aren't more specific than "Latin". So, with this method, you'd have a hard time hiding umlauts and thorns from Americans, or offering any script to a Kashmiri writer.

查看更多
登录 后发表回答