I have searched through several posts on stackoverflow on how to split a string on comma delimiter, but ignore splitting on comma in quotes (see: How do I split a string into an array by comma but ignore commas inside double quotes?) I am trying to achieve just similar results, but need to also allow for a string that contains one double quote.
IE. Need "test05, \"test, 05\", test\", test 05"
to splits into
test05
"test, 05"
test"
test 05
I tried a similar method to one mentioned here:
Regex for splitting a string using space when not surrounded by single or double quotes
Using Matcher, instead of split()
. however, that specific examples it splits on spaces, and not on commas. I've tried to adjust the pattern to account for commas, instead, but have not had any luck.
String str = "test05, \"test, 05\", test\", test 05";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|([^,]+?)),++").matcher(str);
for (int i = 0; i < len; i++)
{
m.region(i, len);
if (m.lookingAt())
{
String s = m.group(1);
if ((s.startsWith("\"") && s.endsWith("\"")))
{
s = s.substring(1, s.length() - 1);
}
System.out.println(i + ": \"" + s + "\"");
i += (m.group(0).length() - 1);
}
}
Split against this pattern:
so it will be:
UPD: according to recent changes in question logic, it's better not to use naked split, you should firstly separated text in comma from non-in-commas text, then make simple split(",") on the last one. Just use simple for loop and check how many quotes you've met, simultaneously saving characters you've read into a StringBuffer. At first you saving your characters into StringBuffer, until you met quotes, then you put your StringBuffer into array containing Strings that wasn't in quotes. Then you make new StringBuffer and saving next characters you read into it, after you've met second comma, you've stopping and putting your new StringBuffer into array containing strings that were in commas. Repeating until the end of the string. So you will have 2 arrays, one with Strings that were in commas, others with strings not in commas. Then you should split all elements of the second array.
You have reached the point where regular expressions break down.
I would recommend that you write a simple splitter instead which handles your special cases as you wish. Test Driven Development is great for doing this.
It looks, however, like you are trying to parse CSV lines. Have you considered using a CSV-library for this?
I've had similar issues with this, and I've found no good .net solution so went DIY.
In my application I'm parsing a csv so my split credential is ",". this method I suppose only works for where you have a single char split argument.
So, I've written a function that ignores commas within double quotes. it does it by converting the input string into a character array and parsing char by char
This function, to my application taste also ignores ("") in any input as these are unneeded and present in my input.
Unless you really need to DIY, you should consider the Apache Commons class org.apache.commons.csv.CSVParser
http://commons.apache.org/sandbox/csv/apidocs/org/apache/commons/csv/CSVParser.html
Try this:
For
test05, "test, 05", test", test 05
it produces:and for
test05, "test 05", test", test 05
it produces: