How to split a string, but also keep the delimiter

2018-12-31 02:20发布

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)

I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

标签: java
23条回答
查无此人
2楼-- · 2018-12-31 02:32

Tweaked Pattern.split() to include matched pattern to the list

Added

// add match to the list
        matchList.add(input.subSequence(start, end).toString());

Full source

public static String[] inclusiveSplit(String input, String re, int limit) {
    int index = 0;
    boolean matchLimited = limit > 0;
    ArrayList<String> matchList = new ArrayList<String>();

    Pattern pattern = Pattern.compile(re);
    Matcher m = pattern.matcher(input);

    // Add segments before each match found
    while (m.find()) {
        int end = m.end();
        if (!matchLimited || matchList.size() < limit - 1) {
            int start = m.start();
            String match = input.subSequence(index, start).toString();
            matchList.add(match);
            // add match to the list
            matchList.add(input.subSequence(start, end).toString());
            index = end;
        } else if (matchList.size() == limit - 1) { // last one
            String match = input.subSequence(index, input.length())
                    .toString();
            matchList.add(match);
            index = end;
        }
    }

    // If no match was found, return this
    if (index == 0)
        return new String[] { input.toString() };

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize - 1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);
}
查看更多
梦醉为红颜
3楼-- · 2018-12-31 02:32

An extremely naive and inefficient solution which works nevertheless.Use split twice on the string and then concatenate the two arrays

String temp[]=str.split("\\W");
String temp2[]=str.split("\\w||\\s");
int i=0;
for(String string:temp)
System.out.println(string);
String temp3[]=new String[temp.length-1];
for(String string:temp2)
{
        System.out.println(string);
        if((string.equals("")!=true)&&(string.equals("\\s")!=true))
        {
                temp3[i]=string;
                i++;
        }
//      System.out.println(temp.length);
//      System.out.println(temp2.length);
}
System.out.println(temp3.length);
String[] temp4=new String[temp.length+temp3.length];
int j=0;
for(i=0;i<temp.length;i++)
{
        temp4[j]=temp[i];
        j=j+2;
}
j=1;
for(i=0;i<temp3.length;i++)
{
        temp4[j]=temp3[i];
        j+=2;
}
for(String s:temp4)
System.out.println(s);
查看更多
余生请多指教
4楼-- · 2018-12-31 02:34

I got here late, but returning to the original question, why not just use lookarounds?

Pattern p = Pattern.compile("(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
System.out.println(Arrays.toString(p.split("'ab','cd','eg'")));
System.out.println(Arrays.toString(p.split("boo:and:foo")));

output:

[', ab, ',', cd, ',', eg, ']
[boo, :, and, :, foo]

EDIT: What you see above is what appears on the command line when I run that code, but I now see that it's a bit confusing. It's difficult to keep track of which commas are part of the result and which were added by Arrays.toString(). SO's syntax highlighting isn't helping either. In hopes of getting the highlighting to work with me instead of against me, here's how those arrays would look it I were declaring them in source code:

{ "'", "ab", "','", "cd", "','", "eg", "'" }
{ "boo", ":", "and", ":", "foo" }

I hope that's easier to read. Thanks for the heads-up, @finnw.

查看更多
笑指拈花
5楼-- · 2018-12-31 02:35

I had a look at the above answers and honestly none of them I find satisfactory. What you want to do is essentially mimic the Perl split functionality. Why Java doesn't allow this and have a join() method somewhere is beyond me but I digress. You don't even need a class for this really. Its just a function. Run this sample program:

Some of the earlier answers have excessive null-checking, which I recently wrote a response to a question here:

https://stackoverflow.com/users/18393/cletus

Anyway, the code:

public class Split {
    public static List<String> split(String s, String pattern) {
        assert s != null;
        assert pattern != null;
        return split(s, Pattern.compile(pattern));
    }

    public static List<String> split(String s, Pattern pattern) {
        assert s != null;
        assert pattern != null;
        Matcher m = pattern.matcher(s);
        List<String> ret = new ArrayList<String>();
        int start = 0;
        while (m.find()) {
            ret.add(s.substring(start, m.start()));
            ret.add(m.group());
            start = m.end();
        }
        ret.add(start >= s.length() ? "" : s.substring(start));
        return ret;
    }

    private static void testSplit(String s, String pattern) {
        System.out.printf("Splitting '%s' with pattern '%s'%n", s, pattern);
        List<String> tokens = split(s, pattern);
        System.out.printf("Found %d matches%n", tokens.size());
        int i = 0;
        for (String token : tokens) {
            System.out.printf("  %d/%d: '%s'%n", ++i, tokens.size(), token);
        }
        System.out.println();
    }

    public static void main(String args[]) {
        testSplit("abcdefghij", "z"); // "abcdefghij"
        testSplit("abcdefghij", "f"); // "abcde", "f", "ghi"
        testSplit("abcdefghij", "j"); // "abcdefghi", "j", ""
        testSplit("abcdefghij", "a"); // "", "a", "bcdefghij"
        testSplit("abcdefghij", "[bdfh]"); // "a", "b", "c", "d", "e", "f", "g", "h", "ij"
    }
}
查看更多
无色无味的生活
6楼-- · 2018-12-31 02:35

I don't think it is possible with String#split, but you can use a StringTokenizer, though that won't allow you to define your delimiter as a regex, but only as a class of single-digit characters:

new StringTokenizer("Hello, world. Hi!", ",.!", true); // true for returnDelims
查看更多
一个人的天荒地老
7楼-- · 2018-12-31 02:38

Pass the 3rd aurgument as "true". It will return delimiters as well.

StringTokenizer(String str, String delimiters, true);
查看更多
登录 后发表回答