Weird characters while reading file content

2019-07-17 06:29发布

问题:

I'm not sure what is wrong:

for line in open(textfile, 'r'):
    print(line)

Output:

abcd

The file was created using textpad++ using Unix EOL and UTF8 encoding.

Now it works properly using Encoding with UTF-8 without BOM option on notepad++ . But why? I mean how could I convert all sent files to UTF-8 to avoid weird chars?

回答1:

Specifying encoding will solve your problem.

for line in open(textfile, 'r', encoding='utf-8-sig'):
    print(line)

utf_8_sig: UTF-8 codec with BOM signature



回答2:

You must set the encoding of your file while reading it, using UTF-8.

Add a third parameter to your code, setting its enconding. From:

for line in open(textfile, 'r'):
    print(line)

to:

for line in open(textfile, 'r', encoding='utf-8-sig'):
    print (line)