String x=" i am going to the party at 6.00 in the evening. are you coming with me?";
if i have the above string, i need that to be broken to sentences by using sentence boundry punctuations(like . and ?)
but it should not split the sentence at 6 because of having an pointer there. is there a way to identify what is the correct sentence boundry place in java? i have tried using stringTokenizer in java.util pakage but it always break the sentence whenever it finds a pointer. Can someone suggest me a method to do this correctly?
This is the method which i have tried in tokenizing a text into sentences.
public static ArrayList<String> sentence_segmenter(String text) {
ArrayList<String> Sentences = new ArrayList<String>();
StringTokenizer st = new StringTokenizer(text, ".?!");
while (st.hasMoreTokens()) {
Sentences.add(st.nextToken());
}
return Sentences;
}
also i have a method to segement sentences into phrases, but here also when the program found comma(,) it splits the text. but i dont need to split it when there is a number like 60,000 with a comma in the middle. following is the method i am using to segment the phrases.
public static ArrayList<String> phrasesSegmenter(String text) {
ArrayList<String> phrases = new ArrayList<String>();
StringTokenizer st = new StringTokenizer(text, ",");
while (st.hasMoreTokens()) {
phrases.add(st.nextToken());
}
return phrases;
}
Here is my Solution to the problem.
From the documentation of
StringTokenizer
:In case you'd use split, you can use any regular expression to split the text into sentences. You probably want something like any of
?!.
and either a space or end of text: