i have file having name "SSE-Künden, SSE-Händler.pdf"
which having those two unicode char ( ü,ä)
when i am printing this file name on python interpreter the unicode values are getting converted into respective ascii value i guess 'SSE-K\x81nden, SSE-H\x84ndler.pdf'
but i want to
test dir contains the pdf file of name 'SSE-Künden, SSE-Händler.pdf'
i tried this:
path = 'C:\test'
for a,b,c in os.walk(path):
print c
['SSE-K\x81nden, SSE-H\x84ndler.pdf']
how do i convert this ascii chars to its respective unicode vals and i want to show the original name("SSE-Künden, SSE-Händler.pdf"
) on interpreter and also writeing into some file as it is.how do i achive this. I am using Python 2.6 and windows OS.
Thanks.
Assuming your terminal supports displaying the characters, iterate over the list of files and print them individually (or use Python 3, which displays Unicode in lists):
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk(u'.'):
... for n in f:
... print n
...
SSE-Künden, SSE-Händler.pdf
Also note I used a Unicode string (u'.') for the path. This instructs os.walk
to return Unicode strings as opposed to byte strings. When dealing with non-ASCII filenames this is a good idea.
In Python 3 strings are Unicode by default and non-ASCII characters are displayed to the user instead of displayed as escape codes:
Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk('.'):
... print(f)
...
['SSE-Künden, SSE-Händler.pdf']
for a,b,c in os.walk(path):
for n in c:
print n.decode('utf-8')
For writing to a file: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data