where Py_FileSystemDefaultEncoding is set in pytho

2019-05-11 03:46发布

问题:

i am curious about how python source code set the value of Py_FileSystemDefaultEncoding. And i have receive a strange thing.

Since python doc about sys.getfilesystemencoding() said that:

On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.

i use python 2.7.6

```

>>>import sys
>>>sys.getfilesystemencoding()
>>>'UTF-8'
>>>import locale
>>>locale.nl_langinfo(locale.CODESET)
>>>'ANSI_X3.4-1968'

```
Here is the question: why the value of getfilesystemencoding() is different from the value of locale.nl_landinfo() since the doc says that getfilesystemencoding() is derived from locale.nl_landinfo().

Here is the locale command output in my terminal:

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=

回答1:

Summary: sys.getfilesystemencoding() behaves as documented. The confusion is due to the difference between setlocale(LC_CTYPE, "") (user's preference) and the default C locale.


The script always starts with the default C locale:

>>> import locale
>>> locale.nl_langinfo(locale.CODESET)
'ANSI_X3.4-1968'

But getfilesystemencoding() uses user's locale:

>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'
>>> locale.setlocale(locale.LC_CTYPE, '')
'en_US.UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'

Empty string as a locale name selects a locale based on the user choice of the appropriate environment variables.

$ LC_CTYPE=C python -c 'import sys; print(sys.getfilesystemencoding())'
ANSI_X3.4-1968
$ LC_CTYPE=C.UTF-8 python -c 'import sys; print(sys.getfilesystemencoding())'
UTF-8

where can i find the source code about setting Py_FileSystemDefaultEncoding.

There are two places in the source code for Python 2.7:

  • bltinmodule.c specifies Py_FileSystemDefaultEncoding on Windows and OS X
  • Py_InitializeEx() sets it on other Unix systems -- notice: setlocale(LC_CTYPE, "") is called before nl_langinfo(CODESET) and it is restored back setlocale(LC_CTYPE, saved_locale) after.

Can you give me some advice how to search some keywords in python source code

To find these places:

  • clone Python 2.7 source code:

    $ hg clone https://hg.python.org/cpython && cd cpython
    $ hg update 2.7
    
  • search for Py_FileSystemDefaultEncoding *= regex in your editor e.g.:

    $ make TAGS # to create tags table
    

    in Emacs: M-x tags-search RET Py_FileSystemDefaultEncoding *= RET and M-, to continue the search.