java.util.regex - importance of Pattern.compile()?

2020-01-24 10:38发布

What is the importance of Pattern.compile() method?
Why do I need to compile the regex string before getting the Matcher object?

For example :

String regex = "((\\S+)\\s*some\\s*";

Pattern pattern = Pattern.compile(regex); // why do I need to compile
Matcher matcher = pattern.matcher(text);

标签: java regex
8条回答
趁早两清
2楼-- · 2020-01-24 11:31

The compile() method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int) to retrieve the contents of capturing groups. The only way to get ahold of the Matcher object is through the Pattern object's matcher() method, and the only way to get ahold of the Pattern object is through the compile() method. Then there's the find() method which, unlike matches(), is not duplicated in the String or Pattern classes.

The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches() method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:

for (String s : myStringList) {
    if ( s.matches("\\d+") ) {
        doSomething();
    }
}

...is exactly equivalent to this:

for (String s : myStringList) {
    if ( Pattern.compile("\\d+").matcher(s).matches() ) {
        doSomething();
    }
}

Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
    if ( m.reset(s).matches() ) {
        doSomething();
    }
}

If you're familiar with .NET regexes, you may be wondering if Java's compile() method is related to .NET's RegexOptions.Compiled modifier; the answer is no. Java's Pattern.compile() method is merely equivalent to .NET's Regex constructor. When you specify the Compiled option:

Regex r = new Regex(@"\d+", RegexOptions.Compiled); 

...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent; there's no difference between a Pattern that's created behind the scenes by String#matches(String) and one you create explicitly with Pattern#compile(String).

(EDIT: I originally said that all .NET Regex objects are cached, which is incorrect. Since .NET 2.0, automatic caching occurs only with static methods like Regex.Matches(), not when you call a Regex constructor directly. ref)

查看更多
forever°为你锁心
3楼-- · 2020-01-24 11:31

Pattern class is the entry point of the regex engine.You can use it through Pattern.matches() and Pattern.comiple(). #Difference between these two. matches()- for quickly check if a text (String) matches a given regular expression comiple()- create the reference of Pattern. So can use multiple times to match the regular expression against multiple texts.

For reference:

public static void main(String[] args) {
     //single time uses
     String text="The Moon is far away from the Earth";
     String pattern = ".*is.*";
     boolean matches=Pattern.matches(pattern,text);
     System.out.println("Matches::"+matches);

    //multiple time uses
     Pattern p= Pattern.compile("ab");
     Matcher  m=p.matcher("abaaaba");
     while(m.find()) {
         System.out.println(m.start()+ " ");
     }
}
查看更多
登录 后发表回答