When working on *nix system, I always set the locale as en_US.UTF-8
, and then this can help me display the Chinese correctly on the stdout.
But I know that there is also zh_CN.UTF-8 for locale setting as well, so I want to know:
What the the different of them?
When should I use zh_CN.UTF-8 or en_US.UTF-8?
Having zero knowledge about zh in itself, changing between the two locales you mentioned may change how certain characters are treated at a word boundary and how different programs produce output.
For example LC_CTYPE=zh_CN.UTF-8
will most likely consider characters with accent marks as "being part of a word" whereas LC_CTYPE=en_US.UTF-8
might not consider those being part of a word.
Same goes for date and currency formats. As I'm pretty sure zh will have different date/currency format than us.
To give you a concrete example, here is what I get from date(1) with two different locales in a relatively recent Ubuntu GNU/Linux system:
user@devbook:~$ LC_TIME=fi_FI.UTF-8 date
to 16.1.2014 07.14.36 +0200
user@devbook:~$ LC_TIME=en_US.UTF-8 date
Thu Jan 16 07:14:42 EET 2014
According to the documentation here:
A locale consists of a number of categories for which
country-dependent formatting or other specifications exist. A
program's locale defines its code sets, date and time formatting
conventions, monetary conventions, decimal formatting conventions, and
collation (sort) order.
If two locales both have UTF-8 in their names, they have the same encoding. Their difference resides in locale-dependent settings. For example, time format as @Sami Laine has already pointed out; monetary sign, in zh_CN.UTF-8
, the money sign is ¥
while in en_US.UTF-8
, the money sign is $
.
More complete list of differences
According to here, for a more complete difference between the two locales, run the follwoing script,
CATS="LC_CTYPE LC_COLLATE LC_MONETARY LC_NUMERIC LC_TIME LC_MESSAGES"
LANG=en_US.utf8 locale -k $CATS > en_US.utf8.out
LANG=zh_CN.utf8 locale -k $CATS > zh_CN.utf8.out
diff en_US.utf8.out zh_CN.utf8.out
The above script should give a more detailed difference between the two locales.