Regular expression for csv with commas and no quot

2019-08-22 03:04发布

I'm trying to parse really complicated csv, which is generated wittout any quotes for columns with commas.
The only tip I get, that commas with whitespace before or after are included in field.

Jake,HomePC,Microsoft VS2010, Microsoft Office 2010

Should be parsed to

Jake
HomePC
Microsoft VS2010, Microsoft Office 2010

Can anybody advice please on how to include "\s," and ,"\s" to column body.

3条回答
孤傲高冷的网名
2楼-- · 2019-08-22 03:38

If your language supports lookbehind assertions, split on

(?<!\s),(?!\s)

In C#:

string[] splitArray = Regex.Split(subjectString, 
    @"(?<!\s) # Assert that the previous character isn't whitespace
    ,         # Match a comma
    (?!\s)    # Assert that the following character isn't whitespace", 
    RegexOptions.IgnorePatternWhitespace);
查看更多
祖国的老花朵
3楼-- · 2019-08-22 03:45

split by r"(?!\s+),(?!\s+)"

in python you can do this like

import re
re.split(r"(?!\s+),(?!\s+)", s) # s is your string
查看更多
\"骚年 ilove
4楼-- · 2019-08-22 03:48

Try this. It gave me the desired result which you have mentioned.

StringBuilder testt = new StringBuilder("Jake,HomePC,Microsoft VS2010, Microsoft Office 2010,Microsoft VS2010, Microsoft Office 2010");
Pattern varPattern = Pattern.compile("[a-z0-9],[a-z0-9]", Pattern.CASE_INSENSITIVE);
Matcher varMatcher = varPattern.matcher(testt);
List<String> list = new ArrayList<String>();
int startIndex = 0, endIndex = 0;
boolean found = false;
while (varMatcher.find()) {
endIndex = varMatcher.start()+1;
if (startIndex == 0) {
list.add(testt.substring(startIndex, endIndex));
} else {
startIndex++;
list.add(testt.substring(startIndex, endIndex));
}
startIndex = endIndex;
found = true;
}
if (found) {
if (startIndex == 0) {
list.add(testt.substring(startIndex));
} else {
list.add(testt.substring(startIndex + 1));
}
}
for (String s : list) {
System.out.println(s);
}

Please note that the code is in Java.

查看更多
登录 后发表回答