I know how to get substrings from a string which are coma seperated but here's a complication: what if substring contains a coma.
If a substring contains a coma, new line or double quotes the entire substring is encapsulated with double quotes.
If a substring contains a double quote the double quote is escaped with another double quote. Worst case scenario would be if I have something like this:
first,"second, second","""third"" third","""fourth"", fourth"
In this case substrings are:
- first
- second, second
- "third" third
- "fourth", fourth
second, second is encapsulated with double quotes, I don't want those double quotes in a list/array.
"third" third is encapsulated with double quotes because it contains double quotes and those are escaped with aditional double quotes. Again I don't want the encapsulating double quotes in a list/array and i don't want the double quotes that escape double quotes, but I want original double quotes which are a part of the substring.
Thank you for your answers, but before I got to see them I wrote this solution, it's not pretty but it works for me.
Mini parser
Result
One way using
TextFieldParser
:For
Try this
I would suggest you to construct a small state machine for this problem. You would have states like:
This will certainly read CSV correctly. You can also make the separator configurable, so that you support TSV or semicolon-separated format.
Also keep in mind one very important case in CSV format: Quoted field may contain new line! Another special case to keep an eye on: empty field (like: ,,).
This is not the most elegant solution but it might help you. I would loop through the characters and do an odd-even count of the quotes. For example you have a bool that is true if you have encountered an odd number of quotes and false for an even number of quotes.
Any comma encountered while this bool value is true should not be considered as a separator. If you know it is a separator you can do several things with that information. Below I replaced the delimiter with something more manageable (not very efficient though):
At this point you should have an array of strings that are separated on the commas that you have intended. From here on it will be simpler to handle the quotes.
I hope this can help you.