I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
I had a similar case as this and setting
worked
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with
data = pd.read_csv(path, skiprows=2)
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
This is what I did.
sep='::'
solved my issue:I have the same problem when read_csv: ParserError: Error tokenizing data. I just saved the old csv file to a new csv file. The problem is solved!