Printing Arabic/Persian letters in python 2.7 [dup

2019-04-10 10:46发布

问题:

This question already has an answer here:

  • Why do I get the u“xyz” format when I print a list of unicode strings in Python? 3 answers

Python doesn't seem to be working with Arabic letters here in the code below. Any ideas?

#!/usr/bin/python
# -*- coding: utf-8 -*-

import nltk
sentence = "ورود ممنوع"

tokens = nltk.word_tokenize(sentence)

print tokens

the result is:

>>> 
['\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf', '\xd9\x85\xd9\x85\xd9\x86\xd9\x88\xd8\xb9']
>>> 

I also tried adding a u before the string, but it didn't help:

>>> u"ورود ممنوع">>>
['\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf', '\xd9\x85\xd9\x85\xd9\x86\xd9\x88\xd8\xb9']

回答1:

You have correct results in list with byte strings:

>>> lst = ['\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf',
           '\xd9\x85\xd9\x85\xd9\x86\xd9\x88\xd8\xb9']
>>> for l in lst:
...  print l
... 
ورود
ممنوع

to convert it to unicode you can use list comprehantion:

>>> lst = [e.decode('utf-8') for e in lst]
>>> lst
[u'\u0648\u0631\u0648\u062f', u'\u0645\u0645\u0646\u0648\u0639']

Printing Unicode Char inside a List