java.util.regex - importance of Pattern.compile()?

2020-01-24 10:38发布

What is the importance of Pattern.compile() method?
Why do I need to compile the regex string before getting the Matcher object?

For example :

String regex = "((\\S+)\\s*some\\s*";

Pattern pattern = Pattern.compile(regex); // why do I need to compile
Matcher matcher = pattern.matcher(text);

标签: java regex
8条回答
【Aperson】
2楼-- · 2020-01-24 11:04

It is matter of performance and memory usage, compile and keep the complied pattern if you need to use it a lot. A typical usage of regex is to validated user input (format), and also format output data for users, in these classes, saving the complied pattern, seems quite logical as they usually called a lot.

Below is a sample validator, which is really called a lot :)

public class AmountValidator {
    //Accept 123 - 123,456 - 123,345.34
    private static final String AMOUNT_REGEX="\\d{1,3}(,\\d{3})*(\\.\\d{1,4})?|\\.\\d{1,4}";
    //Compile and save the pattern  
    private static final Pattern AMOUNT_PATTERN = Pattern.compile(AMOUNT_REGEX);


    public boolean validate(String amount){

         if (!AMOUNT_PATTERN.matcher(amount).matches()) {
            return false;
         }    
        return true;
    }    
}

As mentioned by @Alan Moore, if you have reusable regex in your code, (before a loop for example), you must compile and save pattern for reuse.

查看更多
家丑人穷心不美
3楼-- · 2020-01-24 11:06

Pattern.compile() allow to reuse a regex multiple times (it is threadsafe). The performance benefit can be quite significant.

I did a quick benchmark:

    @Test
    public void recompile() {
        var before = Instant.now();
        for (int i = 0; i < 1_000_000; i++) {
            Pattern.compile("ab").matcher("abcde").matches();
        }
        System.out.println("recompile " + Duration.between(before, Instant.now()));
    }

    @Test
    public void compileOnce() {
        var pattern = Pattern.compile("ab");
        var before = Instant.now();
        for (int i = 0; i < 1_000_000; i++) {
            pattern.matcher("abcde").matches();
        }
        System.out.println("compile once " + Duration.between(before, Instant.now()));
    }

compileOnce was between 3x and 4x faster. I guess it highly depends on the regex itself but for a regex that is often used, I go for a static Pattern pattern = Pattern.compile(...)

查看更多
你好瞎i
4楼-- · 2020-01-24 11:11

When you compile the Pattern Java does some computation to make finding matches in Strings faster. (Builds an in-memory representation of the regex)

If you are going to reuse the Pattern multiple times you would see a vast performance increase over creating a new Pattern every time.

In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.

查看更多
ら.Afraid
5楼-- · 2020-01-24 11:12

Compile parses the regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedly it will gain some performance to cache the compiled pattern.

查看更多
倾城 Initia
6楼-- · 2020-01-24 11:15

Pre-compiling the regex increases the speed. Re-using the Matcher gives you another slight speedup. If the method gets called frequently say gets called within a loop, the overall performace will certainly go up.

查看更多
别忘想泡老子
7楼-- · 2020-01-24 11:30

Similar to 'Pattern.compile' there is 'RECompiler.compile' [from com.sun.org.apache.regexp.internal] where:
1. compiled code for pattern [a-z] has 'az' in it
2. compiled code for pattern [0-9] has '09' in it
3. compiled code for pattern [abc] has 'aabbcc' in it.

Thus compiled code is a great way to generalize multiple cases. Thus instead of having different code handling situation 1,2 and 3 . The problem reduces to comparing with the ascii of present and next element in the compiled code, hence the pairs. Thus
a. anything with ascii between a and z is between a and z
b. anything with ascii between 'a and a is definitely 'a'

查看更多
登录 后发表回答