UnicodeEncodeError when using the compile function

2019-04-29 20:46发布

Using python 3.2 in Windows 7 I am getting the following in IDLE:

>>compile('pass', r'c:\temp\工具\module1.py', 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

Can anybody explain why the compile statement tries to convert the unicode filename using mbcs? I know that sys.getfilesystemencoding returns 'mbcs' in Windows, but I thought that this is not used when unicode file names are provided.

for example:

f = open(r'c:\temp\工具\module1.py') 

works.

For a more complete test save the following in a utf8 encoded file and run it using the standard python.exe version 3.2

# -*- coding: utf8 -*-
fname = r'c:\temp\工具\module1.py'
# I do have the a file named fname but you can comment out the following two lines
f = open(fname)
print('ok')
cmp = compile('pass', fname, 'exec')
print(cmp)

Output:

ok
Traceback (most recent call last):
  File "module8.py", line 6, in <module>
    cmp = compile('pass', fname, 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character

3条回答
贼婆χ
2楼-- · 2019-04-29 20:48

Here a solution that worked for me: Issue 427: UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-6: ordinal not in range (128):

If you look the PyScripter help file in the topic "Encoded Python Source Files" (last paragraph) it tells you how to configure Python to support other encodings by modifying the site.py file. This file is in the lib subdirectory of the Python installation directory. Find the function setencoding and make sure that the support locale aware default string encodings is on. (see below)

def setencoding():
  """Set the string encoding used by the Unicode implementation.  The
  default is 'ascii', but if you're willing to experiment, you can
  change this."""
  encoding = "ascii" # Default value set by _PyUnicode_Init()
  if 0:  <<<--- set this to 1 ---------------------------------
      # Enable to support locale aware default string encodings.
      import locale
      loc = locale.getdefaultlocale ()
      if loc[1]:
          encoding = loc[1]
  if 0:
      # Enable to switch off string to Unicode coercion and implicit
      # Unicode to string conversion.
      encoding = "undefined"
  if encoding != "ascii":
      # On Non-Unicode builds this will raise an AttributeError...
      sys.setdefaultencoding (encoding) # Needs Python Unicode
build !
查看更多
小情绪 Triste *
3楼-- · 2019-04-29 20:50

From Python issue 10114, it seems that the logic is that all filenames used by Python should be valid for the platform where they are used. It is encoded using the filesystem encoding to be used in the C internals of Python.

I agree that it probably shouldn't throw an error on Windows, because any Unicode filename is valid. You may wish to file a bug report with Python for this. But be aware that the necessary changes might not be trivial, because any C code using the filename has to have something to do if it can't be encoded.

查看更多
Emotional °昔
4楼-- · 2019-04-29 20:56

I think you could try to change the "\" in the path of file into "/",just like

compile('pass', r'c:\temp\工具\module1.py', 'exec')

compile('pass', r'c:/temp/工具/module1.py', 'exec')

I have met a problem just like you, I used this method to solve the problem. I hope it can work with yours.

查看更多
登录 后发表回答