'utf-8' codec can't decode byte 0xa0 i

2019-01-23 19:43发布

According to the SEC the data set is provided in a single encoding, as follows:

Tab Delimited Value (.txt): utf-8, tab-delimited, \n- terminated lines, with the first line containing the field names in lowercase.

My current code:

import csv

with open('txt.tsv') as tsvfile:
    reader = csv.DictReader(tsvfile, dialect='excel-tab')
    for row in reader:
        print(row)

All attempts ended with the following error message:

'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

I am a bit lost. Can anyone help me? Many thanks in advance.

标签： python csv encoding utf-8

2条回答

2楼-- · 2019-01-23 20:29

If someone works on Turkish data, then I suggest this line:

df = pd.read_csv("text.txt",encoding='windows-1254')

0人赞添加讨论(0) 举报

3楼-- · 2019-01-23 20:40

Encoding in the file is 'windows-1252'. Use:

open('txt.tsv', encoding='windows-1252')

0人赞添加讨论(0) 举报