Dealing with commas in a CSV file

2018-12-31 01:21发布

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.

标签: csv
23条回答
梦该遗忘
2楼-- · 2018-12-31 02:14

Put double quotes around strings. That is generally what Excel does.

Ala Eli,

you escape a double quote as two double quotes. E.g. "test1","foo""bar","test2"

查看更多
萌妹纸的霸气范
3楼-- · 2018-12-31 02:14

The simplest solution I've found is the one LibreOffice uses:

  1. Replace all literal " by
  2. Put double quotes around your string

You can also use the one that Excel uses:

  1. Replace all literal " by ""
  2. Put double quotes around your string

Notice other people recommended to do only step 2 above, but that doesn't work with lines where a " is followed by a ,, like in a CSV where you want to have a single column with the string hello",world, as the CSV would read:

"hello",world"

Which is interpreted as a row with two columns: hello and world"

查看更多
泛滥B
4楼-- · 2018-12-31 02:15

Add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL).

Use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file Here is the sample code:

 Dim parser As TextFieldParser = New TextFieldParser("C:\mar0112.csv")
 parser.TextFieldType = FieldType.Delimited
 parser.SetDelimiters(",")      

   While Not parser.EndOfData         
      'Processing row             
      Dim fields() As String = parser.ReadFields         
      For Each field As String In fields             
         'TODO: Process field                   

      Next      
      parser.Close()
   End While 
查看更多
旧时光的记忆
5楼-- · 2018-12-31 02:15

I used Csvreader library but by using that I got data by exploding from comma(,) in column value.

So If you want to insert CSV file data which contains comma(,) in most of the columns values, you can use below function. Author link => https://gist.github.com/jaywilliams/385876

function csv_to_array($filename='', $delimiter=',')
{
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $header = NULL;
    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
        {
            if(!$header)
                $header = $row;
            else
                $data[] = array_combine($header, $row);
        }
        fclose($handle);
    }
    return $data;
}
查看更多
低头抚发
6楼-- · 2018-12-31 02:16

You can put double quotes around the fields. I don't like this approach, as it adds another special character (the double quote). Just define an escape character (usually backslash) and use it wherever you need to escape something:

data,more data,more data\, even,yet more

You don't have to try to match quotes, and you have fewer exceptions to parse. This simplifies your code, too.

查看更多
裙下三千臣
7楼-- · 2018-12-31 02:16

In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in " as RFC4180 Section 2 proposes:

sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile

Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.

enter image description here

In combination with the --in-place-option you can apply these changes directly to the file.

In order to "build" the right regex, there's a simple principle to follow:

  1. For every field in your CSV that comes before the field with the unwanted comma(s) you write one [^,]*, and put them all together in a capturing group.
  2. For the field that contains the unwanted comma(s) you write (.*).
  3. For every field after the field with the unwanted comma(s) you write one ,.* and put them all together in a capturing group.

Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3.

([^,]*)(,.*)                     #first field, regex
"\1"\2                           #first field, substitution

(.*,)([^,]*)                     #last field, regex
\1"\2"                           #last field, substitution


([^,]*,)(.*)(,.*,.*,.*)          #second field (out of five fields)
([^,]*,[^,]*,)(.*)(,.*)          #third field (out of four fields)
([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)

If you want to remove the unwanted comma(s) with sed instead of enclosing them with quotation marks refer to this answer.

查看更多
登录 后发表回答