Python Pandas Error tokenizing data

2019-01-01 00:15发布

I'm trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

I have tried to read the pandas docs, but found nothing.

My code is simple:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

How can I resolve this? Should I use the csv module or another language ?

File is from Morningstar

24条回答
余欢
2楼-- · 2019-01-01 00:49

Sometimes the problem is not how to use python, but with the raw data.
I got this error message

Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.

It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.

查看更多
梦醉为红颜
3楼-- · 2019-01-01 00:51

Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.

result = pandas.read_csv(data_source, compression='gzip')
查看更多
宁负流年不负卿
4楼-- · 2019-01-01 00:55

I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.

Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.

I usually get around this by reading the extra data into a file then use the read_csv() method.

The exact solution might differ depending on your actual file, but this approach has worked for me in several cases

查看更多
姐姐魅力值爆表
5楼-- · 2019-01-01 00:55

I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.

查看更多
弹指情弦暗扣
6楼-- · 2019-01-01 00:56

you could also try;

data = pd.read_csv('file1.csv', error_bad_lines=False)
查看更多
低头抚发
7楼-- · 2019-01-01 00:56

I had a dataset with prexisting row numbers, I used index_col:

pd.read_csv('train.csv', index_col=0)
查看更多
登录 后发表回答