I am using a batch file to run a python script with command line arguments. (Yes I know python could do this on its own but I want to understand why this is not working) One argument is a file path with umlauts (ä, ü, ö). If I use the windows console and write the path with the keyboard everything works fine. If I try to use a batch file (run_script.bat) and test it with os.path.exists(filepath) it fails. I read a lot about encode and decode stuff but still no solution.
Example:
I wrote following code and saved it into a python module named parsepath.py:
import os
import sys
def main():
fn = sys.argv[1]
if os.path.exists(fn):
print os.path.basename(fn)
# file exists
else:
print 'Could not read path {}'.format(fn)
if __name__ == '__main__':
print 'starting'
main()
I created a folder named c:\täterätä. On a windows console I type in c:\python27\python.exe c:\path\to\parsepath.py "c:\täterätä"
This results in
starting
tõterõtõ
The Umlaut ä looks differently in the console but this is also an encoding problem between browser and console, I guess. Anyway this works.
If I put this in the batch file
"C:\python27\python.exe" "C:\path\to\parsepath.py" "c:\täteräta"
and run the batch file it does not work.
starting
Could not read path c:\t+±ter+±ta
It seemed that I have to decode somehow. I used an editor that shows the encoding and saved the batch file as "utf-8".
I tried to modify parsepath.py with
...
fn = sys.argv[1]
fn = fn.decode('utf-8')
...
but no luck. There is an error message.
Other encodings that work on a pure command line do not work with a batch file either:
fn = fn.decode('mbcs')
In some way the batch file changes the characters of the file path but I do not know in which way. I found out that the code page of the cmd.exe is 850. this is also what sys.stdout.encoding says (cp850).
If I print sys.argv the file path using pure command line input will be
['parsepath.py', 'c:\\t\xe4ter\xe4t\xe4']
If I print the file path inside sys.argv coming from the batch file it is:
['C:\\path\\to\\parsepath.py','c:\\t+\xf1ter+\xf1t+\xf1']
The difference is obvious: ä is represented as \xe4 but the batch file produces +\xf1
???