Reading Japanese filenames in windows, using Pytho

2020-08-03 04:45发布

问题:

I just setup PortablePython on my system, so I can run python scripts from PHP and I got some very basic code (Below) to list all the files in a directory, however it doesn't work with Japanese filenames. It works fine with English filenames, but it spits out errors (Below) when I put any file containing Japanese characters in the directory.

import os, glob

path = 'G:\path'
for infile in glob.glob( os.path.join(path, '*') ):
    print("current file is: ", infile)

It works fine using 'PyScripter-Portable.exe', however when I try to run 'PortablePython\App\python.exe "test.py"' in the command prompt or from PHP it spits out the following errors:

current file is:  Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print("current file is: ", infile)
  File "PortablePython\App\lib\io.py", line 1494, in write
    b = encoder.encode(s)
  File "PortablePython\App\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 37-40: character maps to <undefined>



I'm very new to Python and am just using this to get around a PHP issue with not being able to read unicode filenames in Windows... So I really need this to work - any help you can give me would be great.

回答1:

Assuming you're using python 2.x, try changing your strings to unicode, like this:

path = u'G:\path'
for infile in glob.glob( os.path.join(path, u'*') ):
    print( u"current file is: ", infile)

That should let python's filesystem-related functions know that you want to work with unicode file names.



回答2:

The problem is probably that whatever output destination you're printing to doesn't use the same encoding as the file system. The general rule is that you should get text into Unicode as soon as possible, and then convert to whatever byte encoding you need upon output (e.g. utf-8).

Since you're dealing with filenames, they should be in the system encoding.

import sys
fse = sys.getfilesystemencoding()
filenames = [unicode(x, fse) for x in glob.glob( os.path.join(path, '*') )]

Now all your filenames are Unicode, and you need to figure out the correct encoding to output from the command prompt or whatever (you can launch a Unicode version of the command prompt with the u flag: "cmd /u")