What is a reliable way of getting allowed locale n

2019-03-23 08:42发布

问题:

I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale.

The ?Sys.setlocale help page just states that the allowed values are OS dependent, and gives these examples:

Sys.setlocale("LC_TIME", "de")     # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8")   # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8")  # ditto
Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows

Under Linux, the possibilities can be retrieved using

locales <- system("locale -a", intern = TRUE)
##  [1] "C"                    "C.utf8"               "POSIX"               
##  [4] "af_ZA"                "af_ZA.utf8"           "am_ET"
##  ...

I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:

library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1])    #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1])  #Mac

Locales on Windows are much more problematic: they require long names of the form “language_country”, for example:

Sys.setlocale("LC_ALL", "German_Germany")

I can't find a reliable reference for the list of locales under Windows. Calling locale -a from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)

There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab that contains time zone details).

My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG column of the table.

http://msdn.microsoft.com/en-us/library/dd318693.aspx

Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK is English_United Kingdom.

Sys.setlocale("LC_ALL", "English_United Kingdom")

Where there are variants in different alphabets, parentheses are needed.

Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")

This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.

Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")

The Windows Region and Language dialog box (Windows\System32\intl.cpl, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.

There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale() returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?

Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale(), I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example

Sys.setlocale(, "Inuktitut (Latin)_Canada")

is fine, but

Sys.setlocale(, "Inuktitut (Syllabics)_Canada")

fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C.

I'm still interested to hear from any Indians, etc., as to what locale you have.

回答1:

In answer to your first question, here's the output on my Mac:

> locales <- system("locale -a", intern = TRUE)
> library(stringr)
> unique(str_split_fixed(locales, "\\.", 2)[, 1]) 
 [1] "af_ZA" "am_ET" "be_BY" "bg_BG" "ca_ES" "cs_CZ" "da_DK" "de_AT" "de_CH"
[10] "de_DE" "el_GR" "en_AU" "en_CA" "en_GB" "en_IE" "en_NZ" "en_US" "es_ES"
[19] "et_EE" "eu_ES" "fi_FI" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "he_IL" "hi_IN"
[28] "hr_HR" "hu_HU" "hy_AM" "is_IS" "it_CH" "it_IT" "ja_JP" "kk_KZ" "ko_KR"
[37] "lt_LT" "nl_BE" "nl_NL" "no_NO" "pl_PL" "pt_BR" "pt_PT" "ro_RO" "ru_RU"
[46] "sk_SK" "sl_SI" "sr_YU" "sv_SE" "tr_TR" "uk_UA" "zh_CN" "zh_HK" "zh_TW"
[55] "C"     "POSIX"

I'm not sure what I'm expecting to see with Sys.setlocale() but it doesn't throw any errors:

> Sys.setlocale(locale="he_IL")
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
> Sys.getlocale()
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"


回答2:

Thanks all. I went to the URL that Richie suggested, http://msdn.microsoft.com/en-us/library/dd318693.aspx, and tried LANG_BELARUSIAN in windows. That didn't work, so I lopped off the "LANG_" and included "BELARUSIAN" by itself. Worked fine.

> bk.date1

[1] "Ma 2012 august 14 11:28:30 "

ymd_hms(bk.date1, locale = "BELARUSIAN") [1] "2012-08-14 11:28:30 UTC"