I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.
I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.
The CSV looks like:
123,2.99,AMO024,Title,"Description, more info",,123987564
The regex I've tried to use is:
thisLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)
The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.
If you know that you won't have an empty field (,,) then this expression works well:
As in the following example...
However, if you anticipate an empty field and your text is relatively small than you might consider replacing the empty fields with a space prior to parsing to ensure that they are captured. For example...
And if you need to maintain the integrity of the fields, you can restore the commas and test for empty spaces inside the loop. This may not be the most efficient method but it gets the job done.
I needed this answer too, but I found the answers, while informative, a little hard to follow and replicate for other languages. Here is the simplest expression I came up with for a single column out of the CSV line. I am not splitting. I'm building a regex to match a column out of the CSV so I'm not splitting the line:
This matches a single column from the CSV line. The first portion
"([^"]*)"
of the expression is to match a quoted entry, the second part[^,]*
is to match a non-quoted entry. Then either followed by a,
or end of line$
.And the accompanying debuggex to test out the expression.
https://www.debuggex.com/r/s4z_Qi2gZiyzpAhx
The correct regular expression to match a single quoted value with escaped [doubled] single quotes in it is:
I had a similar need for splitting CSV values from SQL insert statements.
In my case, I could assume that strings were wrapped in single quotations and numbers were not.
For some probably obvious reason, this regex produces some blank results. I could ignore those, since any empty values in my data were represented as
...,'',...
and not...,,...
.I'm using this one, it works with coma separator and double quote escaping. Normally that's should solved your problem :
Description
Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.
This expression will:
Regex:
(?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
Example
Sample Text
ASP example using the non-java expression
Matches using the non-java expression
Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma