Python Pandas Error tokenizing data

I'm trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

I have tried to read the pandas docs, but found nothing.

My code is simple:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

How can I resolve this? Should I use the csv module or another language ?

File is from Morningstar

标签： python csv pandas

24条回答

余欢

2楼-- · 2019-01-01 00:49

Sometimes the problem is not how to use python, but with the raw data.
I got this error message

Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.

It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.

0人赞添加讨论(0) 举报

梦醉为红颜

3楼-- · 2019-01-01 00:51

Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.

result = pandas.read_csv(data_source, compression='gzip')

0人赞添加讨论(0) 举报

宁负流年不负卿

4楼-- · 2019-01-01 00:55

I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.

Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.

I usually get around this by reading the extra data into a file then use the read_csv() method.

The exact solution might differ depending on your actual file, but this approach has worked for me in several cases

0人赞添加讨论(0) 举报

姐姐魅力值爆表

5楼-- · 2019-01-01 00:55

I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.

0人赞添加讨论(0) 举报

弹指情弦暗扣

6楼-- · 2019-01-01 00:56

you could also try;

data = pd.read_csv('file1.csv', error_bad_lines=False)

0人赞添加讨论(0) 举报

低头抚发

7楼-- · 2019-01-01 00:56

I had a dataset with prexisting row numbers, I used index_col:

pd.read_csv('train.csv', index_col=0)

0人赞添加讨论(0) 举报

1 2 3 4 下一页

Python Pandas Error tokenizing data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间