I am new to python. I am trying to use os.path.getsize() to obtain the file size. However, if the file name is not in Englist, but in Chinese, Gemany, or French, etc, Python cannot recognize it and do not return the size of the file. Could you please help me with it? How can I let Python recognize the file's name and return the size of these kind of files?
For example:
The file's name is:Показатели естественного и миграционного прироста до 2030г.doc
path="C:\xxxx\xxx\xxxx\Показатели естественного и миграционного прироста до 2030г.doc"
I'd like to use"
os.path.getsize(path)
But it does not recognize the file name. Could you please kindly tell me what should I do?
Thank you very much!
Use Unicode filenames and let Python encode the codepoints to the correct encoding for your system.
Alternatively, detect the filesystem encoding yourself, and ensure that your filenames are using that specific encoding when passing these to the os.path.getsize()
function.
If you do not yet know what Unicode is, or how that relates to encodings, I urge you to read:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
before you continue.
If you are specifying a literal string in your source code, then you need to make sure that you have specified the codec used to save your source, and to use a unicode literal:
# -*- coding: utf-8 -*-
path = u"C:\xxxx\xxx\xxxx\Показатели естественного и миграционного прироста до 2030г.doc"
specifies that you saved your source code in UTF-8 and that the path
variable should hold a Unicode string (note the u''
string literal).
You can solve your problem with this code:
import codecs
path="C:\xxxx\xxx\xxxx\Показатели естественного и миграционного прироста до 2030г.doc"
path=codecs.decode(path,'utf8')
os.path.getsize(path)