I have a seemingly simple problem of splitting a comma separated String
into tokens, whereby the output should include empty tokens in cases where:
- The first character in the
String
is a comma. - The last character in the
String
is a comma. - Two consecutive commas occur.
For example, for the String
: ",abd,def,,ghi,"
should yield the output: {"", "abd", "def", "", "ghi", ""}
.
I have tried using String.split
, Scanner
and StringTokenizer
for this but each gives a different undesired output (examples below). Can anyone suggest an elegant solution for this, preferably using JDK classes? Obviously I could code something myself but I feel like I'm missing something on one of the three approaches mentioned. Note that the delimiter is a fixed String
although not necessarily a comma, nor a single character.
Example Code
import java.util.*;
public class Main12 {
public static void main(String[] args) {
String s = ",abd,def,,ghi,";
String[] tokens = s.split(",");
System.err.println("--- String.split Output ---");
System.err.println(String.format("%s -> %s", s, Arrays.asList(tokens)));
for (int i=0; i<tokens.length; ++i) {
System.err.println(String.format("tokens[%d] = %s", i, tokens[i]));
}
System.err.println("--- Scanner Output ---");
Scanner sc = new Scanner(s);
sc.useDelimiter(",");
while (sc.hasNext()) {
System.err.println(sc.next());
}
System.err.println("--- StringTokenizer Output ---");
StringTokenizer tok = new StringTokenizer(s, ",");
while (tok.hasMoreTokens()) {
System.err.println(tok.nextToken());
}
}
}
Output
$ java Main12
--- String.split Output ---
,abd,def,,ghi, -> [, abd, def, , ghi]
tokens[0] =
tokens[1] = abd
tokens[2] = def
tokens[3] =
tokens[4] = ghi
--- Scanner Output ---
abd
def
ghi
--- StringTokenizer Output ---
abd
def
ghi
Pass a
-1
tosplit
as thelimit
argument:Then your result array will include any trailing empty strings.
From the javadocs:
Calling
split(regex)
acts as if thelimit
argument is0
, so trailing empty strings are discarded.