Getting file path with umlauts from command line a

2019-08-28 20:02发布

I am using a batch file to run a python script with command line arguments. (Yes I know python could do this on its own but I want to understand why this is not working) One argument is a file path with umlauts (ä, ü, ö). If I use the windows console and write the path with the keyboard everything works fine. If I try to use a batch file (run_script.bat) and test it with os.path.exists(filepath) it fails. I read a lot about encode and decode stuff but still no solution.

Example:

I wrote following code and saved it into a python module named parsepath.py:

import os
import sys
def main():
    fn = sys.argv[1]
    if os.path.exists(fn):
        print os.path.basename(fn)
        # file exists
    else:
        print 'Could not read path {}'.format(fn)


if __name__ == '__main__':
    print 'starting'
    main()

I created a folder named c:\täterätä. On a windows console I type in c:\python27\python.exe c:\path\to\parsepath.py "c:\täterätä"

This results in

starting
tõterõtõ

The Umlaut ä looks differently in the console but this is also an encoding problem between browser and console, I guess. Anyway this works.

If I put this in the batch file

"C:\python27\python.exe" "C:\path\to\parsepath.py" "c:\täteräta"

and run the batch file it does not work.

starting
Could not read path c:\t+±ter+±ta

It seemed that I have to decode somehow. I used an editor that shows the encoding and saved the batch file as "utf-8".

I tried to modify parsepath.py with

...
fn = sys.argv[1]
fn = fn.decode('utf-8')
...

but no luck. There is an error message.

Other encodings that work on a pure command line do not work with a batch file either:

fn = fn.decode('mbcs')

In some way the batch file changes the characters of the file path but I do not know in which way. I found out that the code page of the cmd.exe is 850. this is also what sys.stdout.encoding says (cp850).

If I print sys.argv the file path using pure command line input will be

['parsepath.py', 'c:\\t\xe4ter\xe4t\xe4']

If I print the file path inside sys.argv coming from the batch file it is:

['C:\\path\\to\\parsepath.py','c:\\t+\xf1ter+\xf1t+\xf1']

The difference is obvious: ä is represented as \xe4 but the batch file produces +\xf1

???

0条回答
登录 后发表回答