Unicode filename to python subprocess.call() [dupl

I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:

n = u'c:\\windows\\notepad.exe '
f = u'c:\\temp\\nèw.txt'

subprocess.call(n + f)

which raises famous error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'

Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent

I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself

Thanks

标签： python unicode subprocess call

7条回答

Lonely孤独者°

2楼-- · 2020-02-01 02:13

I found a fine workaround, it's a bit messy, but it works.

subprocess.call is going to pass the text in its own encoding to the terminal, which might or not be the one it's expecting. Because you want to make it portable, you'll need to know the machine's encoding at runtime.

The following

notepad = 'C://Notepad.exe'
subprocess.call([notepad.encode(sys.getfilesystemencoding())])

attempts to figure out the current encoding and therefore applies the correct one to subprocess.call

As a sidenote, I have also found that if you attempt to compose a string with the current directory, using

os.cwd()

Python (or the OS, don't know) will mess up directories with accented characters. To prevent this I have found the following to work:

os.cwd().decode(sys.getfilesystemencoding())

Which is very similar to the solution above.

Hope it helps.

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2020-02-01 02:14

As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run chcp in console.

The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.

chcp 65001 & c:\WINDOWS\notepad.exe nèw.txt & chcp 866

By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.

It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2020-02-01 02:15

I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command. chcp 65001 will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.

0人赞添加讨论(0) 举报

冷血范

5楼-- · 2020-02-01 02:22

It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/ Perhaps you could research the Windows C calls and propose a similar change for subprocess.

0人赞添加讨论(0) 举报

Rolldiameter

6楼-- · 2020-02-01 02:23

If your file exists, you can use short filename (aka 8.3 name). This name is defined for existent files, and should cause no trouble to non-Unicode aware programs when passed as argument.

One way to obtain one (needs Pywin32 to be installed):

import win32api
short_path = win32api.GetShortPathName(unicode_path)

Alternatively, you can also use ctypes:

import ctypes
import ctypes.wintypes

ctypes.windll.kernel32.GetShortPathNameW.argtypes = [
    ctypes.wintypes.LPCWSTR, # lpszLongPath
    ctypes.wintypes.LPWSTR, # lpszShortPath
    ctypes.wintypes.DWORD # cchBuffer
]
ctypes.windll.kernel32.GetShortPathNameW.restype = ctypes.wintypes.DWORD

buf = ctypes.create_unicode_buffer(1024) # adjust buffer size, if necessary
ctypes.windll.kernel32.GetShortPathNameW(unicode_path, buf, len(buf))

short_path = buf.value

0人赞添加讨论(0) 举报

做自己的国王

7楼-- · 2020-02-01 02:23

You can try opening the file as:

subprocess.call((n + f).encode("cp437"))

or whichever codepage chcp reports as being used in a command prompt window. If you try to chcp 65001 as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and add cp65001 as an alias to 'utf-8' beforehand. It's an open issue in the Python source.

UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single chcp command first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode the subprocess.call argument.

0人赞添加讨论(0) 举报

1 2 下一页

Unicode filename to python subprocess.call() [dupl

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间