java regex match count

2019-01-02 16:00发布

Let's say I have a file, and the file contains this:

HelloxxxHelloxxxHello

I compile a pattern to look for 'Hello'

Pattern pattern = Pattern.compile("Hello");

Then I use an inputstream to read in the file and convert it into a String so that it can be regexed.

Once the matcher finds a match in the file, it indicates this, but it doesn't tell me how many matches it found; simply that it found a match within the String.

So, as the string is relatively short, and the buffer I'm using is 200 bytes, it should find three matches. However, it just simply says match, and doesn't provide me with a count of how many matches there were.

What's the easiest way of counting the number of matches that occured within the String. I've tried various for loops and using the matcher.groupCount() but I'm getting nowhere fast.

4条回答
梦醉为红颜
2楼-- · 2019-01-02 16:11

matcher.find() does not find all matches, only the next match.

You'll have to do the following:

int count = 0;
while (matcher.find())
    count++;

Btw, matcher.groupCount() is something completely different.

Complete example:

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String hello = "HelloxxxHelloxxxHello";
        Pattern pattern = Pattern.compile("Hello");
        Matcher matcher = pattern.matcher(hello);

        int count = 0;
        while (matcher.find())
            count++;

        System.out.println(count);    // prints 3
    }
}

Handling overlapping matches

When counting matches of aa in aaaa the above snippet will give you 2.

aaaa
aa
  aa

To get 3 matches, i.e. this behavior:

aaaa
aa
 aa
  aa

You have to search for a match at index <start of last match> + 1 as follows:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
    count++;
    i = matcher.start() + 1;
}

System.out.println(count);    // prints 3
查看更多
美炸的是我
3楼-- · 2019-01-02 16:18

This should work for non disjoint matches:

public static void main(String[] args) {
    String input = "aaaaaaaa";
    String regex = "aa";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);
    int from = 0;
    int count = 0;
    while(matcher.find(from)) {
        count++;
        from = matcher.start() + 1;
    }
    System.out.println(count);
}
查看更多
何处买醉
4楼-- · 2019-01-02 16:19

If you want to use Java 8 streams and are allergic to while loops, you could try this:

public static int countPattern(String references, Pattern referencePattern) {
    Matcher matcher = referencePattern.matcher(references);
    return Stream.iterate(0, i -> i + 1)
            .filter(i -> !matcher.find())
            .findFirst()
            .get();
}

Disclaimer: this only works for disjoint matches.

Example:

public static void main(String[] args) throws ParseException {
    Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
    System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
    System.out.println(countPattern("[  ]", referencePattern));
}

This prints out:

2
0
1
0
查看更多
与君花间醉酒
5楼-- · 2019-01-02 16:20

This may help:

public static void main(String[] args) {
    String hello = "HelloxxxHelloxxxHello";
    String []matches = hello.split("Hello");
    System.out.println(matches.length);    // prints 3
}
查看更多
登录 后发表回答