I've got a simple application that opens a tab-delimited text file, and inserts that data into a database.
I'm using this CSV reader to read the data: http://www.codeproject.com/KB/database/CsvReader.aspx
And it is all working just fine!
Now my client has added a new field to the end of the file, which is "ClaimDescription", and in some of these claim descriptions, the data has quotes in it, example:
"SUMISEI MARU NO 2" - sea of Japan
This seems to be causing a major headache for my app. I get an exception which looks like this:
The CSV appears to be corrupt near record '1470' field '26 at position '181'. Current raw data : ...
And in that "raw data", sure enough the claim description field shows data with quotes in it.
I want to know if anyone has ever had this problem before, and got round it? Obviously I can ask the client to change the data they originally send to me, but this is an automated process that they use to generate the tab-delimited file; and I'd rather use that as a last resort.
I was thinking I could maybe open the file using a standard TextReader before hand, escape any quotes, write the content back into a new file, then feed that file into the CSV Reader. It is probably worth mentioning that the average file size of these tab-delimited files is around 40MB.
Any help is greatly appreciated!
Cheers, Sean
Check the comment on the codeproject article about quotes:
http://www.codeproject.com/Messages/3382857/Re-Quotes-inside-of-the-Field.aspx
You need to specify in the constructor that you want another character besides " to be used as quotes.
Use the FileHelpers library instead. It is widely used and will cope with quoted fields, or fields that contain quotes.
Right - after a late night of redbull and scratching my head, i eventually found the problem, it was commas in the "Claim_Description" field. Didn't even think about that because I was using a tab-delimited file, but as soon as i did a find and replace on all commas in the file it worked absolutely fine!
The next step is to find out how to replace those commas before processing.
Again, thanks for all the suggestions.
Cheers, Sean
I recently solved a similar issue, and although CsvReader was working properly on all but a few lines of my TSV file, what solved my problem in the end was setting a
customDelimiter
in the constructor ofCsvReader
I did some searching, and there is an RFC for CSV files (RFC 4180), and that does explicitly prohibit what they are doing:
Basicly, if they want to do that, they need to enclose that whole field in quotes, like so:
So if you want you can throw this problem back at them and insist they send you a "proper" RFC 4180 CSV file.
Since you have access to the source files for that CSV reader, another option would be to modify it to handle the kind of quoted strings they are feeding you.
This kind of situation is exactly why it is vital to have source code access to your toolset.
If instead you'd like to preprocess (hack) their files before feeing them to your tool, the correct method would be to look for fields with a quote not immediately in front of or behind a separator, and enclose its whole field in another set of quotes.
use OleDbConnection http://social.msdn.microsoft.com/Forums/en/winformsdatacontrols/thread/98fce7d7-b02d-4027-ad2e-2df3a28bd439