I want to read a file that contains also German and not only characters. I found that i can do like this
>>> import codecs
>>> file = codecs.open('file.txt','r', encoding='UTF-8')
>>> lines= file.readlines()
This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?
You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:
$ pip install chardet
Then, for example reading the file in binary mode:
>>> import chardet
>>> chardet.detect(open("file.txt", "rb").read())
{'confidence': 0.9690625, 'encoding': 'utf-8'}
So then:
>>> import codecs
>>> import unicodedata
>>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()
I believe the file is being read correctly but is using the wrong encoding when output. This is based on the fact that you get the proper results in IDLE.
I would suggest trying to use print(line.encode('utf-8'))
but I'm afraid I don't know if Python 3 will print a bytes
object properly.