I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
You can do this step to avoid the problem -
just add -
header=None
Hope this helps!!
This is definitely an issue of delimiter, as most of the csv CSV are got create using
sep='/t'
so try toread_csv
using the tab character(\t)
using separator/t
. so, try to open using following code line.try:
pandas.read_csv(path, sep = ',' ,header=None)
Your CSV file might have variable number of columns and
read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:1) Change the CSV file to have a dummy first line with max number of columns (and specify
header=[0]
)2) Or use
names = list(range(0,N))
where N is the max number of columns.use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)