I'm writing a Java program which adds all of the words on a website into a HashMap, and then assigns them a key of how many times they appear on the page. For instance, if I ran it on a page with only the words "hello, java, coffee, java", the output would be
Java : 2 Coffee : 1 Hello : 1
This also ignores certain words that I don't want included. Here's what I have so far.
Map<String, Integer> found = new HashMap<>(); // (word,frequency)
Matcher match = Pattern.compile(word_pattern).matcher(content);
while (match.find()) {
// Get the net word in lowercase
String word = match.group().toLowerCase();
//If not the set of words to ignore, add to the found Map
if(!ignore.contains(word))
found.put(word, );
}
System.out.println(found);
}
The second parameter, an int, I assume should be calculated before I add the word to the HashMap.
found.put(word, int );
But I'm unsure of how exactly to add up the occurrences of a word while keeping O(nlogn) time.