Unicode filename to python subprocess.call() [dupl

2020-02-01 01:44发布

站内文章 / Python

29 0

再贱就再见

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has answers here:

Unicode filenames on Windows with Python & subprocess.Popen() (5 answers)

Closed 4 years ago.

I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:

n = u'c:\\windows\\notepad.exe '
f = u'c:\\temp\\nèw.txt'

subprocess.call(n + f)

which raises famous error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'

Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent

I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself

Thanks

回答1:

I found a fine workaround, it's a bit messy, but it works.

subprocess.call is going to pass the text in its own encoding to the terminal, which might or not be the one it's expecting. Because you want to make it portable, you'll need to know the machine's encoding at runtime.

The following

notepad = 'C://Notepad.exe'
subprocess.call([notepad.encode(sys.getfilesystemencoding())])

attempts to figure out the current encoding and therefore applies the correct one to subprocess.call

As a sidenote, I have also found that if you attempt to compose a string with the current directory, using

os.cwd()

Python (or the OS, don't know) will mess up directories with accented characters. To prevent this I have found the following to work:

os.cwd().decode(sys.getfilesystemencoding())

Which is very similar to the solution above.

Hope it helps.

回答2:

If your file exists, you can use short filename (aka 8.3 name). This name is defined for existent files, and should cause no trouble to non-Unicode aware programs when passed as argument.

One way to obtain one (needs Pywin32 to be installed):

import win32api
short_path = win32api.GetShortPathName(unicode_path)

Alternatively, you can also use ctypes:

import ctypes
import ctypes.wintypes

ctypes.windll.kernel32.GetShortPathNameW.argtypes = [
    ctypes.wintypes.LPCWSTR, # lpszLongPath
    ctypes.wintypes.LPWSTR, # lpszShortPath
    ctypes.wintypes.DWORD # cchBuffer
]
ctypes.windll.kernel32.GetShortPathNameW.restype = ctypes.wintypes.DWORD

buf = ctypes.create_unicode_buffer(1024) # adjust buffer size, if necessary
ctypes.windll.kernel32.GetShortPathNameW(unicode_path, buf, len(buf))

short_path = buf.value

回答3:

It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/ Perhaps you could research the Windows C calls and propose a similar change for subprocess.

回答4:

I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command. chcp 65001 will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.

回答5:

As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run chcp in console.

The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.

chcp 65001 & c:\WINDOWS\notepad.exe nèw.txt & chcp 866

By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.

It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.

回答6:

You can try opening the file as:

subprocess.call((n + f).encode("cp437"))

or whichever codepage chcp reports as being used in a command prompt window. If you try to chcp 65001 as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and add cp65001 as an alias to 'utf-8' beforehand. It's an open issue in the Python source.

UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single chcp command first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode the subprocess.call argument.