I want to import two kinds of CSV files, some use ";" for delimiter and others use ",". So far I have been switching between the next two lines:
reader=csv.reader(f,delimiter=';')
or
reader=csv.reader(f,delimiter=',')
Is it possible not to specify the delimiter and to let the program check for the right delimiter?
The solutions below (Blender and sharth) seem to work well for comma-separated files (generated with Libroffice) but not for semicolon-separated files (generated with MS Office). Here are the first lines of one semicolon-separated file:
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes
1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
The
csv
module seems to recommend using the csv sniffer for this problem.They give the following example, which I've adapted for your case.
Let's try it out.
And our sample inputs
And if we execute the example program:
It's also probably worth noting what version of python I'm using.
Given a project that deals with both , (comma) and | (vertical bar) delimited CSV files, which are well formed, I tried the following (as given at https://docs.python.org/2/library/csv.html#csv.Sniffer):
However, on a |-delimited file, the "Could not determine delimiter" exception was returned. It seemed reasonable to speculate that the sniff heuristic might work best if each line has the same number of delimiters (not counting whatever might be enclosed in quotes). So, instead of reading the first 1024 bytes of the file, I tried reading the first two lines in their entirety:
So far, this is working well for me.
I don't think there can be a perfectly general solution to this (one of the reasons I might use
,
as a delimiter is that some of my data fields need to be able to include;
...). A simple heuristic for deciding might be to simply read the first line (or more), count how many,
and;
characters it contains (possibly ignoring those inside quotes, if whatever creates your.csv
files quotes entries properly and consistently), and guess that the more frequent of the two is the right delimiter.And if you're using
DictReader
you can do that:I used this with
Python 3.5
and it worked this way.To solve the problem, I have created a function which reads the first line of a file (header) and detects the delimiter.