I'm trying to take a string:
String s = "This is a String!";
And return all 2-word pairs within that string. Namely:
{"this is", "is a", "a String"}
But right now, all I can get it to do is return:
{"this is", "a String"}
How can I define my while loop such that I can account for this lack of overlapping words? My code is as follows: (Really, I'd be happy with it just returning an int representing how many string subsets it found...)
int count = 0;
while(matcher.find()) {
count += 1;
}
Thanks all.
I like the two answers already posted, counting words and subtracting one, but if you just need a regex to find overlapping matches:
Pattern pattern = Pattern.compile('\\S+ \\S+');
Matcher matcher = pattern.matcher(inputString);
int matchCount = 0;
boolean found = matcher.find();
while (found) {
matchCount += 1;
// search starting after the last match began
found = matcher.find(matcher.start() + 1);
}
In reality, you'll need to be a little more clever than simply adding 1, since trying this on "the force" will match "he force" and then "e force". Of course, this is overkill for counting words, but this may prove useful if the regex is more complicated than that.
Total pair count = Total number of words - 1
And you already know how to count total number of words.
Run a for loop from i = 0 to the number of words - 2, then the words i and i+1 will make up a single 2-word string.
String[] splitString = string.split(" ");
for(int i = 0; i < splitString.length - 1; i++) {
System.out.println(splitString[i] + " " + splitString[i+1]);
}
The number of 2-word strings within a sentence is simply the number of words minus one.
int numOfWords = string.split(" ").length - 1;
I tried with group of pattern.
String s = "this is a String";
Pattern pat = Pattern.compile("([^ ]+)( )([^ ]+)");
Matcher mat = pat.matcher(s);
boolean check = mat.find();
while(check){
System.out.println(mat.group());
check = matPOS.find(mat.start(3));
}
from the pattern ([^ ]+)( )([^ ]+)
...........................|_______________|
..................................group(0)
..........................|([^ ]+)
| <--group(1)
......................................|( )
| <--group(2)
............................................|([^ ]+)
| <--group(3)