I'm reading a CSV file in C++ and the row format is as such:
"Primary, Secondary, Third", "Primary", , "Secondary", 18, 4, 0, 0, 0
(notice the empty value)
When I do:
while (std::getline(ss, csvElement, ',')) {
csvColumn.push_back(csvElement);
}
This splits up the first string into pieces which isn't correct.
How do I preserve the string when iterating? I tried to do a combination of the above and while also grabbing the lines separated by double quote but I got wild results.
You need to interpret the comma depending on whether you're betwwen the quote or not. This is too complexfor
getline()
.The solution would be to read the full line with
getline()
, and parse the line by iterating through the string character by character, and maintaing an indicator whether you're between double quotes or not.Here is a first "raw" example (double quotes are not removed in the fields and escape characters are not interpreted):
Online demo
Here is the C++ approach I have used.
I noticed that you have only 3 field types: string, null, and int.
The following approach uses these field types (in method "void init()"), in the order each each row presents the fields, sometimes using string::find() ( instead of getline() ) to locate field end.
Each of the 3 methods consumes characters from the string with erase. I know erase is slow, but I made this choice for my convenience. (erasing is easier to test, just add a cout after each extract). The erase's can be removed / replaced by appropriate handling (where needed) of start-of-search index.
Using
std::quoted
allows you to read quoted strings from input streams.Live Example
The caveat is that quoted strings are only extracted if the first non-whitespace character of a value is a double-quote. Additionally, any characters after the quoted strings will be discarded up until the next comma.