I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:
n = u'c:\\windows\\notepad.exe '
f = u'c:\\temp\\nèw.txt'
subprocess.call(n + f)
which raises famous error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'
Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent
I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself
Thanks
I found a fine workaround, it's a bit messy, but it works.
subprocess.call is going to pass the text in its own encoding to the terminal, which might or not be the one it's expecting. Because you want to make it portable, you'll need to know the machine's encoding at runtime.
The following
attempts to figure out the current encoding and therefore applies the correct one to subprocess.call
As a sidenote, I have also found that if you attempt to compose a string with the current directory, using
Python (or the OS, don't know) will mess up directories with accented characters. To prevent this I have found the following to work:
Which is very similar to the solution above.
Hope it helps.
As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run
chcp
in console.The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.
By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.
It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.
I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command.
chcp 65001
will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/ Perhaps you could research the Windows C calls and propose a similar change for subprocess.
If your file exists, you can use short filename (aka 8.3 name). This name is defined for existent files, and should cause no trouble to non-Unicode aware programs when passed as argument.
One way to obtain one (needs Pywin32 to be installed):
Alternatively, you can also use
ctypes
:You can try opening the file as:
or whichever codepage
chcp
reports as being used in a command prompt window. If you try tochcp 65001
as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and addcp65001
as an alias to 'utf-8' beforehand. It's an open issue in the Python source.UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single
chcp
command first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode thesubprocess.call
argument.