How to split a string, but also keep the delimiter

2019-01-01 09:47发布

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)

I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

标签: java
23条回答
十年一品温如言
2楼-- · 2019-01-01 09:50
import java.util.regex.*;
import java.util.LinkedList;

public class Splitter {
    private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");

    private Pattern pattern;
    private boolean keep_delimiters;

    public Splitter(Pattern pattern, boolean keep_delimiters) {
        this.pattern = pattern;
        this.keep_delimiters = keep_delimiters;
    }
    public Splitter(String pattern, boolean keep_delimiters) {
        this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
    }
    public Splitter(Pattern pattern) { this(pattern, true); }
    public Splitter(String pattern) { this(pattern, true); }
    public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
    public Splitter() { this(DEFAULT_PATTERN); }

    public String[] split(String text) {
        if (text == null) {
            text = "";
        }

        int last_match = 0;
        LinkedList<String> splitted = new LinkedList<String>();

        Matcher m = this.pattern.matcher(text);

        while (m.find()) {

            splitted.add(text.substring(last_match,m.start()));

            if (this.keep_delimiters) {
                splitted.add(m.group());
            }

            last_match = m.end();
        }

        splitted.add(text.substring(last_match));

        return splitted.toArray(new String[splitted.size()]);
    }

    public static void main(String[] argv) {
        if (argv.length != 2) {
            System.err.println("Syntax: java Splitter <pattern> <text>");
            return;
        }

        Pattern pattern = null;
        try {
            pattern = Pattern.compile(argv[0]);
        }
        catch (PatternSyntaxException e) {
            System.err.println(e);
            return;
        }

        Splitter splitter = new Splitter(pattern);

        String text = argv[1];
        int counter = 1;
        for (String part : splitter.split(text)) {
            System.out.printf("Part %d: \"%s\"\n", counter++, part);
        }
    }
}

/*
    Example:
    > java Splitter "\W+" "Hello World!"
    Part 1: "Hello"
    Part 2: " "
    Part 3: "World"
    Part 4: "!"
    Part 5: ""
*/

I don't really like the other way, where you get an empty element in front and back. A delimiter is usually not at the beginning or at the end of the string, thus you most often end up wasting two good array slots.

Edit: Fixed limit cases. Commented source with test cases can be found here: http://snippets.dzone.com/posts/show/6453

查看更多
梦该遗忘
3楼-- · 2019-01-01 09:51

If you can afford, use Java's replace(CharSequence target, CharSequence replacement) method and fill in another delimiter to split with. Example: I want to split the string "boo:and:foo" and keep ':' at its righthand String.

String str = "boo:and:foo";
str = str.replace(":","newdelimiter:");
String[] tokens = str.split("newdelimiter");

Important note: This only works if you have no further "newdelimiter" in your String! Thus, it is not a general solution. But if you know a CharSequence of which you can be sure that it will never appear in the String, this is a very simple solution.

查看更多
ら面具成の殇う
4楼-- · 2019-01-01 09:53

An extremely naive and inefficient solution which works nevertheless.Use split twice on the string and then concatenate the two arrays

String temp[]=str.split("\\W");
String temp2[]=str.split("\\w||\\s");
int i=0;
for(String string:temp)
System.out.println(string);
String temp3[]=new String[temp.length-1];
for(String string:temp2)
{
        System.out.println(string);
        if((string.equals("")!=true)&&(string.equals("\\s")!=true))
        {
                temp3[i]=string;
                i++;
        }
//      System.out.println(temp.length);
//      System.out.println(temp2.length);
}
System.out.println(temp3.length);
String[] temp4=new String[temp.length+temp3.length];
int j=0;
for(i=0;i<temp.length;i++)
{
        temp4[j]=temp[i];
        j=j+2;
}
j=1;
for(i=0;i<temp3.length;i++)
{
        temp4[j]=temp3[i];
        j+=2;
}
for(String s:temp4)
System.out.println(s);
查看更多
零度萤火
5楼-- · 2019-01-01 09:54

I had a look at the above answers and honestly none of them I find satisfactory. What you want to do is essentially mimic the Perl split functionality. Why Java doesn't allow this and have a join() method somewhere is beyond me but I digress. You don't even need a class for this really. Its just a function. Run this sample program:

Some of the earlier answers have excessive null-checking, which I recently wrote a response to a question here:

https://stackoverflow.com/users/18393/cletus

Anyway, the code:

public class Split {
    public static List<String> split(String s, String pattern) {
        assert s != null;
        assert pattern != null;
        return split(s, Pattern.compile(pattern));
    }

    public static List<String> split(String s, Pattern pattern) {
        assert s != null;
        assert pattern != null;
        Matcher m = pattern.matcher(s);
        List<String> ret = new ArrayList<String>();
        int start = 0;
        while (m.find()) {
            ret.add(s.substring(start, m.start()));
            ret.add(m.group());
            start = m.end();
        }
        ret.add(start >= s.length() ? "" : s.substring(start));
        return ret;
    }

    private static void testSplit(String s, String pattern) {
        System.out.printf("Splitting '%s' with pattern '%s'%n", s, pattern);
        List<String> tokens = split(s, pattern);
        System.out.printf("Found %d matches%n", tokens.size());
        int i = 0;
        for (String token : tokens) {
            System.out.printf("  %d/%d: '%s'%n", ++i, tokens.size(), token);
        }
        System.out.println();
    }

    public static void main(String args[]) {
        testSplit("abcdefghij", "z"); // "abcdefghij"
        testSplit("abcdefghij", "f"); // "abcde", "f", "ghi"
        testSplit("abcdefghij", "j"); // "abcdefghi", "j", ""
        testSplit("abcdefghij", "a"); // "", "a", "bcdefghij"
        testSplit("abcdefghij", "[bdfh]"); // "a", "b", "c", "d", "e", "f", "g", "h", "ij"
    }
}
查看更多
心情的温度
6楼-- · 2019-01-01 09:54

I don't know Java too well, but if you can't find a Split method that does that, I suggest you just make your own.

string[] mySplit(string s,string delimiter)
{
    string[] result = s.Split(delimiter);
    for(int i=0;i<result.Length-1;i++)
    {
        result[i] += delimiter; //this one would add the delimiter to each items end except the last item, 
                    //you can modify it however you want
    }
}
string[] res = mySplit(myString,myDelimiter);

Its not too elegant, but it'll do.

查看更多
只靠听说
7楼-- · 2019-01-01 09:56

You want to use lookarounds, and split on zero-width matches. Here are some examples:

public class SplitNDump {
    static void dump(String[] arr) {
        for (String s : arr) {
            System.out.format("[%s]", s);
        }
        System.out.println();
    }
    public static void main(String[] args) {
        dump("1,234,567,890".split(","));
        // "[1][234][567][890]"
        dump("1,234,567,890".split("(?=,)"));   
        // "[1][,234][,567][,890]"
        dump("1,234,567,890".split("(?<=,)"));  
        // "[1,][234,][567,][890]"
        dump("1,234,567,890".split("(?<=,)|(?=,)"));
        // "[1][,][234][,][567][,][890]"

        dump(":a:bb::c:".split("(?=:)|(?<=:)"));
        // "[][:][a][:][bb][:][:][c][:]"
        dump(":a:bb::c:".split("(?=(?!^):)|(?<=:)"));
        // "[:][a][:][bb][:][:][c][:]"
        dump(":::a::::b  b::c:".split("(?=(?!^):)(?<!:)|(?!:)(?<=:)"));
        // "[:::][a][::::][b  b][::][c][:]"
        dump("a,bb:::c  d..e".split("(?!^)\\b"));
        // "[a][,][bb][:::][c][  ][d][..][e]"

        dump("ArrayIndexOutOfBoundsException".split("(?<=[a-z])(?=[A-Z])"));
        // "[Array][Index][Out][Of][Bounds][Exception]"
        dump("1234567890".split("(?<=\\G.{4})"));   
        // "[1234][5678][90]"

        // Split at the end of each run of letter
        dump("Boooyaaaah! Yippieeee!!".split("(?<=(?=(.)\\1(?!\\1))..)"));
        // "[Booo][yaaaa][h! Yipp][ieeee][!!]"
    }
}

And yes, that is triply-nested assertion there in the last pattern.

Related questions

See also

查看更多
登录 后发表回答