Write a regular expression to count sentences

2020-04-01 07:58发布

I have a String :

"Hello world... I am here. Please respond."

and I would like to count the number of sentences within the String. I had an idea to use a Scanner as well as the useDelimiter method to split any String into sentences.

Scanner in = new Scanner(file);
in.useDelimiter("insert here");

I'd like to create a regular expression which can go through the String I have shown above and identify it to have two sentences. I initially tried using the delimiter:

[^?.]

It gets hung up on the ellipses.

标签: java regex
4条回答
家丑人穷心不美
2楼-- · 2020-04-01 08:34

A regular expression probably isn't the right tool for this. English is not a regular language, so regular expressions get hung up- a lot. For one thing you can't even be sure a period in the middle of the text is an end of sentence- abbreviations (like Mr.), acronyms with periods, and initials will screw you up as well. Its not the right tool.

查看更多
Root(大扎)
3楼-- · 2020-04-01 08:37

this could help :

public int getNumSentences() { List<String> tokens = getTokens( "[^!?.]+" ); return tokens.size(); }

and you can also add enter button as separator and make it independent on your OS by the following line of code

String pattern = System.getProperty("line.separator" + " ");

actually you can find more about the

Enter here : Java regex: newline + white space

and hence finally the method becomes :

public int getNumSentences() 
{
    List<String> tokens = getTokens( "[^!?.]+" + pattern + "+" );
    return tokens.size();
}

hope this could help :) !

查看更多
smile是对你的礼貌
4楼-- · 2020-04-01 08:40

You could use a regular expression that checks for a non end of sentence, followed by an end of sentence like:

[^?!.][?!.]

Although as @Gabe Sechan points out, a regular expression may not be accurate when the sentence includes abbreviated words such as Dr., Rd., St., etc.

查看更多
贼婆χ
5楼-- · 2020-04-01 08:56

For your sentence : "Hello world... I am here. Please respond."

The code will be :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JavaRegex {

    public static void main(String[] args) {
        int count=0;
        String sentence = "Hello world... I am here. Please respond.";
        Pattern pattern = Pattern.compile("\\..");
        Matcher matcher = pattern.matcher(sentence);
        while(matcher.find()) {
            count++;
        }
        System.out.println("No. of sentence = "+count); 
    }

}
查看更多
登录 后发表回答