Splitting comma separated string, ignore commas in

2019-07-29 13:16发布

I have searched through several posts on stackoverflow on how to split a string on comma delimiter, but ignore splitting on comma in quotes (see: How do I split a string into an array by comma but ignore commas inside double quotes?) I am trying to achieve just similar results, but need to also allow for a string that contains one double quote.

IE. Need "test05, \"test, 05\", test\", test 05" to splits into

  • test05
  • "test, 05"
  • test"
  • test 05

I tried a similar method to one mentioned here:

Regex for splitting a string using space when not surrounded by single or double quotes

Using Matcher, instead of split(). however, that specific examples it splits on spaces, and not on commas. I've tried to adjust the pattern to account for commas, instead, but have not had any luck.

String str = "test05, \"test, 05\", test\", test 05";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|([^,]+?)),++").matcher(str);

for (int i = 0; i < len; i++)
{
    m.region(i, len);

    if (m.lookingAt())
    {
        String s = m.group(1);

        if ((s.startsWith("\"") && s.endsWith("\"")))
        {
            s = s.substring(1, s.length() - 1);
        }

        System.out.println(i + ": \"" + s + "\"");
        i += (m.group(0).length() - 1);
    }
}

5条回答
疯言疯语
2楼-- · 2019-07-29 13:25

Split against this pattern:

(?<=\"?),(?!\")|(?<!\"),(?=\")

so it will be:

String[] splitArray = subjectString.split("(?<=\"?),(?!\")|(?<!\"),(?=\")");

UPD: according to recent changes in question logic, it's better not to use naked split, you should firstly separated text in comma from non-in-commas text, then make simple split(",") on the last one. Just use simple for loop and check how many quotes you've met, simultaneously saving characters you've read into a StringBuffer. At first you saving your characters into StringBuffer, until you met quotes, then you put your StringBuffer into array containing Strings that wasn't in quotes. Then you make new StringBuffer and saving next characters you read into it, after you've met second comma, you've stopping and putting your new StringBuffer into array containing strings that were in commas. Repeating until the end of the string. So you will have 2 arrays, one with Strings that were in commas, others with strings not in commas. Then you should split all elements of the second array.

查看更多
Lonely孤独者°
3楼-- · 2019-07-29 13:28

You have reached the point where regular expressions break down.

I would recommend that you write a simple splitter instead which handles your special cases as you wish. Test Driven Development is great for doing this.

It looks, however, like you are trying to parse CSV lines. Have you considered using a CSV-library for this?

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-07-29 13:37

I've had similar issues with this, and I've found no good .net solution so went DIY.

In my application I'm parsing a csv so my split credential is ",". this method I suppose only works for where you have a single char split argument.

So, I've written a function that ignores commas within double quotes. it does it by converting the input string into a character array and parsing char by char

public static string[] Splitter_IgnoreQuotes(string stringToSplit)
    {   
        char[] CharsOfData = stringToSplit.ToCharArray();
        //enter your expected array size here or alloc.
        string[] dataArray = new string[37];
        int arrayIndex = 0;
        bool DoubleQuotesJustSeen = false;          
        foreach (char theChar in CharsOfData)
        {
            //did we just see double quotes, and no command? dont split then. you could make ',' a variable for your split parameters I'm working with a csv.
            if ((theChar != ',' || DoubleQuotesJustSeen) && theChar != '"')
            {
                dataArray[arrayIndex] = dataArray[arrayIndex] + theChar;
            }
            else if (theChar == '"')
            {
                if (DoubleQuotesJustSeen)
                {
                    DoubleQuotesJustSeen = false;
                }
                else
                {
                    DoubleQuotesJustSeen = true;
                }
            }
            else if (theChar == ',' && !DoubleQuotesJustSeen)
            {
                arrayIndex++;
            }
        }
        return dataArray;
    }

This function, to my application taste also ignores ("") in any input as these are unneeded and present in my input.

查看更多
放我归山
5楼-- · 2019-07-29 13:37

Unless you really need to DIY, you should consider the Apache Commons class org.apache.commons.csv.CSVParser

http://commons.apache.org/sandbox/csv/apidocs/org/apache/commons/csv/CSVParser.html

查看更多
Fickle 薄情
6楼-- · 2019-07-29 13:44

Try this:

import java.util.regex.*;

public class Main {
  public static void main(String[] args) throws Exception {

    String text = "test05, \"test, 05\", test\", test 05";

    Pattern p = Pattern.compile(
        "(?x)          # enable comments                                      \n" +
        "(\"[^\"]*\")  # quoted data, and store in group #1                   \n" +
        "|             # OR                                                   \n" +
        "([^,]+)       # one or more chars other than ',', and store it in #2 \n" +
        "|             # OR                                                   \n" +
        "\\s*,\\s*     # a ',' optionally surrounded by space-chars           \n"
    );

    Matcher m = p.matcher(text);

    while (m.find()) {
      // get the match
      String matched = m.group().trim();

      // only print the match if it's group #1 or #2
      if(m.group(1) != null || m.group(2) != null) {
        System.out.println(matched);
      }
    }
  }
}

For test05, "test, 05", test", test 05 it produces:

test05
"test, 05"
test"
test 05

and for test05, "test 05", test", test 05 it produces:

test05
"test 05"
test"
test 05
查看更多
登录 后发表回答