I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.
Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.
Put double quotes around strings. That is generally what Excel does.
Ala Eli,
The simplest solution I've found is the one LibreOffice uses:
"
by”
You can also use the one that Excel uses:
"
by""
Notice other people recommended to do only step 2 above, but that doesn't work with lines where a
"
is followed by a,
, like in a CSV where you want to have a single column with the stringhello",world
, as the CSV would read:Which is interpreted as a row with two columns:
hello
andworld"
Add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL).
Use the
Microsoft.VisualBasic.FileIO.TextFieldParser
class to parse CSV file Here is the sample code:I used Csvreader library but by using that I got data by exploding from comma(,) in column value.
So If you want to insert CSV file data which contains comma(,) in most of the columns values, you can use below function. Author link => https://gist.github.com/jaywilliams/385876
You can put double quotes around the fields. I don't like this approach, as it adds another special character (the double quote). Just define an escape character (usually backslash) and use it wherever you need to escape something:
You don't have to try to match quotes, and you have fewer exceptions to parse. This simplifies your code, too.
In case you're on a *nix-system, have access to
sed
and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in"
as RFC4180 Section 2 proposes:Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.
In combination with the
--in-place
-option you can apply these changes directly to the file.In order to "build" the right regex, there's a simple principle to follow:
[^,]*,
and put them all together in a capturing group.(.*)
.,.*
and put them all together in a capturing group.Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is
\1"\2"\3
.If you want to remove the unwanted comma(s) with
sed
instead of enclosing them with quotation marks refer to this answer.