Regex searches beyond string boundary

2020-04-12 08:27发布

The code is given below:

import java.util.regex.*;

public class RegEx {

    public static void main(String[] args) {

        Pattern p = Pattern.compile("\\d*");
        Matcher m = p.matcher("ab56ef");
        System.out.println("Pattern is " + m.pattern());
        while (m.find()) {
            System.out.print("index: " + m.start() + " " + m.group());
        }
    }
}

The result is:

index: 0 index: 1 index: 2 56 index: 4 index: 5 index: 6

Since "ab34ef" length is 6, the string's highest index is 5.
Why is there a match at index 6? Thank you in advance!

1条回答
够拽才男人
2楼-- · 2020-04-12 09:05

You have 6 indices returned because there are 6 matches here since \d* can match an empty string. There is always an empty string before each character in an input string, because the regex engine is processing text at each position looking for boundaries or specific characters.

Here is the visualization:

enter image description here

Here, the engine examines the beginning of a string, and says: "I see no digit, but I can return a match, since the number of digits can be 0". It returns the empty string as a match, and goes on to b. And so on until the end of string.

If you need to find all numbers, just use a + quantifier with \d shorthand class.

See IDEONE demo

查看更多
登录 后发表回答