Using hebrew on python

2019-07-07 06:38发布

问题:

I have a problem printing hebrew words. i am using the counter module in order to count number of words in my given text (which is in hebrew). the counter indeed counts the words, and identifies the language because i am using # -*- coding: utf-8 -*-

The problem is, when i print my counter, i get weird symbols. (I am using eclipse) Here is the code and the printings:

# -*- coding: utf-8 -*-
import string
from collections import Counter
class classifier:
def __init__(self,filename):
    self.myFile = open(filename)
    self.cnt = Counter()

def generateList(self):
    exclude = set(string.punctuation)
    for lines in self.myFile:
        for word in lines.split():
            if word not in exclude:
                nWord = ""
                for letter in word:
                    if letter in exclude:
                        letter = ""
                        nWord += letter
                    else:
                        nWord += letter
                self.cnt[nWord]+=1
    print self.cnt

Printings:

Counter({'\xd7\x97\xd7\x94': 465, '\xd7\x96\xd7\x95': 432, '\xd7\xa1\xd7\x92\xd7\x95\xd7\xa8': 421, '\xd7\x94\xd7\x92\xd7\x91': 413})

Any idea on how to print the words in the right way?

The "weird symbols" you are getting is python's way of representing unicode strings.

You need to decode them, for example:

>>>print '\xd7\x97\xd7\x94'.decode('UTF8')
חה

Using hebrew on python

问题:

回答1:

收藏的人(0)

Using hebrew on python

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮