Regex to split a CSV

2019-01-03 02:35发布

I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.

I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.

The CSV looks like:

123,2.99,AMO024,Title,"Description, more info",,123987564

The regex I've tried to use is:

thisLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)

The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.

17条回答
虎瘦雄心在
2楼-- · 2019-01-03 03:14

I created this a few months ago for a project.

 ".+?"|[^"]+?(?=,)|(?<=,)[^"]+

Regular expression visualization

It works in C# and the Debuggex was happy when I selected Python and PCRE. Javascript doesn't recognize this form of Proceeded By ?<=....

For your values, it will create matches on

123
,2.99
,AMO024
,Title
"Description, more info"
,
,123987564

Note that anything in quotes doesn't have a leading comma, but attempting to match with a leading comma was required for the empty value use case. Once done, trim values as necessary.

I use RegexHero.Net to test my Regex.

查看更多
来,给爷笑一个
3楼-- · 2019-01-03 03:17

The advantage of using JScript for classic ASP pages is that you can use one of the many, many libraries that have been written for JavaScript.

Like this one: https://github.com/gkindel/CSV-JS. Download it, include it in your ASP page, parse CSV with it.

<%@ language="javascript" %>

<script language="javascript" runat="server" src="scripts/csv.js"></script>
<script language="javascript" runat="server">

var text = '123,2.99,AMO024,Title,"Description, more info",,123987564',
    rows = CSV.parse(line);

    Response.Write(rows[0][4]);
</script>
查看更多
做个烂人
4楼-- · 2019-01-03 03:17

This one matches all i need in c#:

(?<=(^|,)(?<quote>"?))([^"]|(""))*?(?=\<quote>(?=,|$))
  • strips quotes
  • lets new lines
  • lets double quotes in the quoted string
  • lets commas in the quoted string
查看更多
可以哭但决不认输i
5楼-- · 2019-01-03 03:18

Worked on this for a bit and came up with this solution:

(?:,|\n|^)("(?:(?:"")*[^"]*)*"|[^",\n]*|(?:\n|$))

Try it out here!

This solution handles "nice" CSV data like

"a","b",c,"d",e,f,,"g"

0: "a"
1: "b"
2: c
3: "d"
4: e
5: f
6:
7: "g"

and uglier things like

"""test"" one",test' two,"""test"" 'three'","""test 'four'"""

0: """test"" one"
1: test' two
2: """test"" 'three'"
3: """test 'four'"""

Here's an explanation of how it works:

(?:,|\n|^)      # all values must start at the beginning of the file,  
                #   the end of the previous line, or at a comma  
(               # single capture group for ease of use; CSV can be either...  
  "             # ...(A) a double quoted string, beginning with a double quote (")  
    (?:         #        character, containing any number (0+) of  
      (?:"")*   #          escaped double quotes (""), or  
      [^"]*     #          non-double quote characters  
    )*          #        in any order and any number of times  
  "             #        and ending with a double quote character  

  |             # ...or (B) a non-quoted value  

  [^",\n]*      # containing any number of characters which are not  
                # double quotes ("), commas (,), or newlines (\n)  

  |             # ...or (C) a single newline or end-of-file character,  
                #           used to capture empty values at the end of  
  (?:\n|$)      #           the file or at the ends of lines  
)
查看更多
不美不萌又怎样
6楼-- · 2019-01-03 03:20

If i try the regex posted by @chubbsondubs on http://regex101.com using the 'g' flag, there are matches, that contain only ',' or an empty string. With this regex:
(?:"([^"]*)"|([^,]*))(?:[,])
i can match the parts of the CSV (inlcuding quoted parts). (The line must be terminated with a ',' otherwise the last part isn't recognized.)
https://regex101.com/r/dF9kQ8/4
If the CSV looks like:
"",huhu,"hel lo",world,
there are 4 matches:
''
'huhu'
'hel lo'
'world'

查看更多
登录 后发表回答