Given
2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34
How to use C# to split the above information into strings as follows:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
As you can see one of the column contains , <= (Corvallis, OR)
// update // Based on C# Regex Split - commas outside quotes
string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
It is a tricky matter to parse .csv files when the .csv file could be either comma separated strings, comma separated quoted strings, or a chaotic combination of the two. The solution I came up with allows for any of the three possibilities.
I created a method, ParseCsvRow() which returns an array from a csv string. I first deal with double quotes in the string by splitting the string on double quotes into an array called quotesArray. Quoted string .csv files are only valid if there is an even number of double quotes. Double quotes in a column value should be replaced with a pair of double quotes (This is Excel's approach). As long as the .csv file meets these requirements, you can expect the delimiter commas to appear only outside of pairs of double quotes. Commas inside of pairs of double quotes are part of the column value and should be ignored when splitting the .csv into an array.
My method will test for commas outside of double quote pairs by looking only at even indexes of the quotesArray. It also removes double quotes from the start and end of column values.
One downside of my approach is the way I temporarily replace delimiter commas with an obscure unicode character. This character needs to be so obscure, it would never show up in your .csv file. You may want to put more handling around this.
Use the
Microsoft.VisualBasic.FileIO.TextFieldParser
class. This will handle parsing a delimited file,TextReader
orStream
where some fields are enclosed in quotes and some are not.For example:
This should result in the following output:
See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.
You need to add a reference to
Microsoft.VisualBasic
in the Add References .NET tab.Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.
You could split on all commas that do have an even number of quotes following them.
You would also like to view at the
specf
for CSV format about handling comma's.Useful Link :
C# Regex Split - commas outside quotes
I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
A StreamReader is still used to read the CSV line by line, as follows:
With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
Output:
For more information, please visit codeproject article.
Hope it helps.