Tokenising a String containing empty tokens

I have a seemingly simple problem of splitting a comma separated String into tokens, whereby the output should include empty tokens in cases where:

The first character in the String is a comma.
The last character in the String is a comma.
Two consecutive commas occur.

For example, for the String: ",abd,def,,ghi," should yield the output: {"", "abd", "def", "", "ghi", ""}.

I have tried using String.split, Scanner and StringTokenizer for this but each gives a different undesired output (examples below). Can anyone suggest an elegant solution for this, preferably using JDK classes? Obviously I could code something myself but I feel like I'm missing something on one of the three approaches mentioned. Note that the delimiter is a fixed String although not necessarily a comma, nor a single character.

Example Code

import java.util.*;

public class Main12 {
  public static void main(String[] args) {
    String s = ",abd,def,,ghi,";
    String[] tokens = s.split(",");

    System.err.println("--- String.split Output ---");
    System.err.println(String.format("%s -> %s", s, Arrays.asList(tokens)));

    for (int i=0; i<tokens.length; ++i) {
      System.err.println(String.format("tokens[%d] = %s", i, tokens[i]));
    }

    System.err.println("--- Scanner Output ---");

    Scanner sc = new Scanner(s);
    sc.useDelimiter(",");
    while (sc.hasNext()) {
      System.err.println(sc.next());
    }

    System.err.println("--- StringTokenizer Output ---");

    StringTokenizer tok = new StringTokenizer(s, ",");
    while (tok.hasMoreTokens()) {
      System.err.println(tok.nextToken());
    }
  }
}

Output

$ java Main12
--- String.split Output ---
,abd,def,,ghi, -> [, abd, def, , ghi]
tokens[0] =
tokens[1] = abd
tokens[2] = def
tokens[3] =
tokens[4] = ghi
--- Scanner Output ---
abd
def

ghi
--- StringTokenizer Output ---
abd
def
ghi

标签： java string java.util.scanner stringtokenizer string-split

1条回答

地球回转人心会变

2楼-- · 2019-02-09 21:03

Pass a -1 to split as the limit argument:

String s = ",abd,def,,ghi,";
String[] tokens = s.split(",", -1);

Then your result array will include any trailing empty strings.

From the javadocs:

If [the limit] is non-positive then the pattern will be applied as many times as possible and the array can have any length. If [the limit] is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Calling split(regex) acts as if the limit argument is 0, so trailing empty strings are discarded.

0人赞添加讨论(0) 举报

Tokenising a String containing empty tokens

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间