Java Regex String#replaceAll Alternative

2019-01-20 13:30发布

问题:

I've been trying to devise a method of replacing multiple String#replaceAll calls with a Pattern/Matcher instance in the hopes that it would be faster than my current method of replacing text in a String, but I'm not sure how to go about it.

Here is an example of a String that I want to manipulate:

@bla@This is a @red@line @bla@of text.

As you can see, there are multiple @ characters with 3 characters in between; this will always be the case. If I wanted to replace every instance of '@xxx@' (where xxx can be any lowercase letter or digit from 0 to 9), what would the most efficient way to go about it be? Currently I'm storing a Map where its keys are '@xxx@' substrings, and the values are what I want to replace that specific substring with; I check if the whole String contains the '@xxx@' substring, and call a replaceAll method for each instance, but I imagine this is pretty inefficient.

Thank you very much!

TL;DR - Would a Pattern/Matcher to replace a substring of a String with a different String be more efficient than checking if the String contains the substring and using String#replaceAll? If so, how would I go about it?

回答1:

This is a relatively straightforward case for appendReplacement:

// Prepare map of replacements
Map<String,String> replacement = new HashMap<>();
replacement.put("bla", "hello,");
replacement.put("red", "world!");
// Use a pattern that matches three non-@s between two @s
Pattern p = Pattern.compile("@([^@]{3})@");
Matcher m = p.matcher("@bla@This is a @red@line @bla@of text");
StringBuffer sb = new StringBuffer();
while (m.find()) {
    // Group 1 captures what's between the @s
    String tag = m.group(1);
    String repString = replacement.get(tag);
    if (repString == null) {
        System.err.println("Tag @"+tag+"@ is unexpected.");
        continue;
    }
    // Replacement could have special characters, e.g. '\'
    // Matcher.quoteReplacement() will deal with them correctly:
    m.appendReplacement(sb, Matcher.quoteReplacement(repString));
}
m.appendTail(sb);
String result = sb.toString();

Demo.



回答2:

This is a more dynamic version of previous answer to another similar question.

Here is a helper method for searching for any @keyword@ you want. They don't have to be 3 characters long.

private static String replace(String input, Map<String, String> replacement) {
    StringJoiner regex = new StringJoiner("|", "@(", ")@");
    for (String keyword : replacement.keySet())
        regex.add(Pattern.quote(keyword));
    StringBuffer output = new StringBuffer();
    Matcher m = Pattern.compile(regex.toString()).matcher(input);
    while (m.find())
        m.appendReplacement(output, Matcher.quoteReplacement(replacement.get(m.group(1))));
    return m.appendTail(output).toString();
}

Test

Map<String,String> replacement = new HashMap<>();
replacement.put("bla", "hello,");
replacement.put("red", "world!");
replacement.put("Hold", "wait");
replacement.put("Better", "more");
replacement.put("a?b*c", "special regex characters");
replacement.put("foo @ bar", "with spaces and the @ boundary character work");

System.out.println(replace("@bla@This is a @red@line @bla@of text", replacement));
System.out.println(replace("But @Hold@, this can do @Better@!", replacement));
System.out.println(replace("It can even handle @a?b*c@ without dying", replacement));
System.out.println(replace("Keyword @foo @ bar@ too", replacement));

Output

hello,This is a world!line hello,of text
But wait, this can do more!
It can even handle special regex characters without dying
Keyword with spaces and the @ boundary character work too