I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale
.
The ?Sys.setlocale
help page just states that the allowed values are OS dependent, and gives these examples:
Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8") # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8") # ditto
Sys.setlocale("LC_TIME", "de_DE") # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows
Under Linux, the possibilities can be retrieved using
locales <- system("locale -a", intern = TRUE)
## [1] "C" "C.utf8" "POSIX"
## [4] "af_ZA" "af_ZA.utf8" "am_ET"
## ...
I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:
library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1]) #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1]) #Mac
Locales on Windows are much more problematic: they require long names of the form “language_country”, for example:
Sys.setlocale("LC_ALL", "German_Germany")
I can't find a reliable reference for the list of locales under Windows. Calling locale -a
from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)
There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab
that contains time zone details).
My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG
column of the table.
http://msdn.microsoft.com/en-us/library/dd318693.aspx
Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK
is English_United Kingdom
.
Sys.setlocale("LC_ALL", "English_United Kingdom")
Where there are variants in different alphabets, parentheses are needed.
Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")
This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.
Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")
The Windows Region and Language dialog box (Windows\System32\intl.cpl
, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.
There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale()
returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?
Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale()
, I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example
Sys.setlocale(, "Inuktitut (Latin)_Canada")
is fine, but
Sys.setlocale(, "Inuktitut (Syllabics)_Canada")
fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C
.
I'm still interested to hear from any Indians, etc., as to what locale you have.