Regular expression does not match newline obtained

2019-01-23 14:09发布

问题:

I cannot match a String containing newlines when the newline is obtained by using %n in Formatter object or String.format(). Please have a look at the following program:

public class RegExTest {

  public static void main(String[] args) {
    String input1 = String.format("Hallo\nnext line");
    String input2 = String.format("Hallo%nnext line");
    String pattern = ".*[\n\r].*";
    System.out.println(input1+": "+input1.matches(pattern));
    System.out.println(input2+": "+input2.matches(pattern));
  }

}

and its output:

Hallo
next line: true
Hallo
next line: false

What is going on here? Why doesn't the second string match?

Java version is 1.6.0_21.

回答1:

You can set the Pattern.DOTALL flag to make . match newlines, as default it doesn't. It is done with the (?s) notation. So, this regex does what you want:

    String pattern = "(?s).*[\n\r].*";


回答2:

On Windows, in Java, \n is LF, \r is CR and %n is CRLF. Your pattern does not match the latter.

As of Java 8, you can now use \R in regular expressions to match any end-of-line sequence.

Linebreak matcher

\R Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

Example:

String pattern = ".*\\R.*";
String.format("Hallo\nnext line").matches(pattern); // true
String.format("Hallo%nnext line").matches(pattern); // true
String.format("Hallo same line").matches(pattern); // false