How to split a string, but also keep the delimiter

2018-12-31 02:20发布

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)

I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

标签: java
23条回答
荒废的爱情
2楼-- · 2018-12-31 02:40
    String expression = "((A+B)*C-D)*E";
    expression = expression.replaceAll("\\+", "~+~");
    expression = expression.replaceAll("\\*", "~*~");
    expression = expression.replaceAll("-", "~-~");
    expression = expression.replaceAll("/+", "~/~");
    expression = expression.replaceAll("\\(", "~(~"); //also you can use [(] instead of \\(
    expression = expression.replaceAll("\\)", "~)~"); //also you can use [)] instead of \\)
    expression = expression.replaceAll("~~", "~");
    if(expression.startsWith("~")) {
        expression = expression.substring(1);
    }

    String[] expressionArray = expression.split("~");
    System.out.println(Arrays.toString(expressionArray));
查看更多
宁负流年不负卿
3楼-- · 2018-12-31 02:41

I suggest using Pattern and Matcher, which will almost certainly achieve what you want. Your regular expression will need to be somewhat more complicated than what you are using in String.split.

查看更多
高级女魔头
4楼-- · 2018-12-31 02:41

Fast answer: use non physical bounds like \b to split. I will try and experiment to see if it works (used that in PHP and JS).

It is possible, and kind of work, but might split too much. Actually, it depends on the string you want to split and the result you need. Give more details, we will help you better.

Another way is to do your own split, capturing the delimiter (supposing it is variable) and adding it afterward to the result.

My quick test:

String str = "'ab','cd','eg'";
String[] stra = str.split("\\b");
for (String s : stra) System.out.print(s + "|");
System.out.println();

Result:

'|ab|','|cd|','|eg|'|

A bit too much... :-)

查看更多
皆成旧梦
5楼-- · 2018-12-31 02:41

I don't know Java too well, but if you can't find a Split method that does that, I suggest you just make your own.

string[] mySplit(string s,string delimiter)
{
    string[] result = s.Split(delimiter);
    for(int i=0;i<result.Length-1;i++)
    {
        result[i] += delimiter; //this one would add the delimiter to each items end except the last item, 
                    //you can modify it however you want
    }
}
string[] res = mySplit(myString,myDelimiter);

Its not too elegant, but it'll do.

查看更多
怪性笑人.
6楼-- · 2018-12-31 02:43

Here's a groovy version based on some of the code above, in case it helps. It's short, anyway. Conditionally includes the head and tail (if they are not empty). The last part is a demo/test case.

List splitWithTokens(str, pat) {
    def tokens=[]
    def lastMatch=0
    def m = str=~pat
    while (m.find()) {
      if (m.start() > 0) tokens << str[lastMatch..<m.start()]
      tokens << m.group()
      lastMatch=m.end()
    }
    if (lastMatch < str.length()) tokens << str[lastMatch..<str.length()]
    tokens
}

[['<html><head><title>this is the title</title></head>',/<[^>]+>/],
 ['before<html><head><title>this is the title</title></head>after',/<[^>]+>/]
].each { 
   println splitWithTokens(*it)
}
查看更多
何处买醉
7楼-- · 2018-12-31 02:44
import java.util.regex.*;
import java.util.LinkedList;

public class Splitter {
    private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");

    private Pattern pattern;
    private boolean keep_delimiters;

    public Splitter(Pattern pattern, boolean keep_delimiters) {
        this.pattern = pattern;
        this.keep_delimiters = keep_delimiters;
    }
    public Splitter(String pattern, boolean keep_delimiters) {
        this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
    }
    public Splitter(Pattern pattern) { this(pattern, true); }
    public Splitter(String pattern) { this(pattern, true); }
    public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
    public Splitter() { this(DEFAULT_PATTERN); }

    public String[] split(String text) {
        if (text == null) {
            text = "";
        }

        int last_match = 0;
        LinkedList<String> splitted = new LinkedList<String>();

        Matcher m = this.pattern.matcher(text);

        while (m.find()) {

            splitted.add(text.substring(last_match,m.start()));

            if (this.keep_delimiters) {
                splitted.add(m.group());
            }

            last_match = m.end();
        }

        splitted.add(text.substring(last_match));

        return splitted.toArray(new String[splitted.size()]);
    }

    public static void main(String[] argv) {
        if (argv.length != 2) {
            System.err.println("Syntax: java Splitter <pattern> <text>");
            return;
        }

        Pattern pattern = null;
        try {
            pattern = Pattern.compile(argv[0]);
        }
        catch (PatternSyntaxException e) {
            System.err.println(e);
            return;
        }

        Splitter splitter = new Splitter(pattern);

        String text = argv[1];
        int counter = 1;
        for (String part : splitter.split(text)) {
            System.out.printf("Part %d: \"%s\"\n", counter++, part);
        }
    }
}

/*
    Example:
    > java Splitter "\W+" "Hello World!"
    Part 1: "Hello"
    Part 2: " "
    Part 3: "World"
    Part 4: "!"
    Part 5: ""
*/

I don't really like the other way, where you get an empty element in front and back. A delimiter is usually not at the beginning or at the end of the string, thus you most often end up wasting two good array slots.

Edit: Fixed limit cases. Commented source with test cases can be found here: http://snippets.dzone.com/posts/show/6453

查看更多
登录 后发表回答