i am curious about how python source code set the value of Py_FileSystemDefaultEncoding. And i have receive a strange thing.
Since python doc about sys.getfilesystemencoding() said that:
On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.
i use python 2.7.6
```
>>>import sys
>>>sys.getfilesystemencoding()
>>>'UTF-8'
>>>import locale
>>>locale.nl_langinfo(locale.CODESET)
>>>'ANSI_X3.4-1968'
```
Here is the question: why the value of getfilesystemencoding() is different from the value of locale.nl_landinfo() since the doc says that getfilesystemencoding() is derived from locale.nl_landinfo().
Here is the locale command output in my terminal:
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=
Summary:
sys.getfilesystemencoding()
behaves as documented. The confusion is due to the difference betweensetlocale(LC_CTYPE, "")
(user's preference) and the default C locale.The script always starts with the default C locale:
But
getfilesystemencoding()
uses user's locale:Empty string as a locale name selects a locale based on the user choice of the appropriate environment variables.
There are two places in the source code for Python 2.7:
bltinmodule.c
specifiesPy_FileSystemDefaultEncoding
on Windows and OS XPy_InitializeEx()
sets it on other Unix systems -- notice:setlocale(LC_CTYPE, "")
is called beforenl_langinfo(CODESET)
and it is restored backsetlocale(LC_CTYPE, saved_locale)
after.To find these places:
clone Python 2.7 source code:
search for
Py_FileSystemDefaultEncoding *=
regex in your editor e.g.:in Emacs:
M-x tags-search RET Py_FileSystemDefaultEncoding *= RET
andM-,
to continue the search.