Read a text file with non-ASCII characters in an u

2019-01-24 03:00发布

问题:

I want to read a file that contains also German and not only characters. I found that i can do like this

  >>> import codecs
  >>> file = codecs.open('file.txt','r', encoding='UTF-8')
  >>> lines= file.readlines()

This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?

回答1:

You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:

$ pip install chardet

Then, for example reading the file in binary mode:

>>> import chardet
>>> chardet.detect(open("file.txt", "rb").read())
{'confidence': 0.9690625, 'encoding': 'utf-8'}

So then:

>>> import codecs
>>> import unicodedata
>>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()


回答2:

I believe the file is being read correctly but is using the wrong encoding when output. This is based on the fact that you get the proper results in IDLE.

I would suggest trying to use print(line.encode('utf-8')) but I'm afraid I don't know if Python 3 will print a bytes object properly.