I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for
kwarg
compression
resolved my problem.I came across the same issue. Using
pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
you could also try;
I had a dataset with prexisting row numbers, I used index_col: