what is the different of zh_CN.UTF-8 and en_US.UTF

2019-05-10 21:11发布

问题:

When working on *nix system, I always set the locale as en_US.UTF-8, and then this can help me display the Chinese correctly on the stdout.

But I know that there is also zh_CN.UTF-8 for locale setting as well, so I want to know:
What the the different of them?
When should I use zh_CN.UTF-8 or en_US.UTF-8?

回答1:

Having zero knowledge about zh in itself, changing between the two locales you mentioned may change how certain characters are treated at a word boundary and how different programs produce output.

For example LC_CTYPE=zh_CN.UTF-8 will most likely consider characters with accent marks as "being part of a word" whereas LC_CTYPE=en_US.UTF-8 might not consider those being part of a word.

Same goes for date and currency formats. As I'm pretty sure zh will have different date/currency format than us.

To give you a concrete example, here is what I get from date(1) with two different locales in a relatively recent Ubuntu GNU/Linux system:

user@devbook:~$ LC_TIME=fi_FI.UTF-8 date
to 16.1.2014 07.14.36 +0200
user@devbook:~$ LC_TIME=en_US.UTF-8 date
Thu Jan 16 07:14:42 EET 2014


回答2:

According to the documentation here:

A locale consists of a number of categories for which country-dependent formatting or other specifications exist. A program's locale defines its code sets, date and time formatting conventions, monetary conventions, decimal formatting conventions, and collation (sort) order.

If two locales both have UTF-8 in their names, they have the same encoding. Their difference resides in locale-dependent settings. For example, time format as @Sami Laine has already pointed out; monetary sign, in zh_CN.UTF-8, the money sign is while in en_US.UTF-8, the money sign is $.

More complete list of differences

According to here, for a more complete difference between the two locales, run the follwoing script,

CATS="LC_CTYPE LC_COLLATE LC_MONETARY LC_NUMERIC LC_TIME LC_MESSAGES"
LANG=en_US.utf8 locale -k $CATS > en_US.utf8.out
LANG=zh_CN.utf8 locale -k $CATS > zh_CN.utf8.out

diff en_US.utf8.out zh_CN.utf8.out

The above script should give a more detailed difference between the two locales.



标签: unix utf-8