Most Efficient Way to Check File for List of Words

I just had a homework assignment that wanted me to add all the Java keywords to a HashSet. Then read in a .java file, and count how many times any keyword appeared in the .java file.

The route I took was: Created an String[] array that contained all the keywords. Created a HashSet, and used Collections.addAll to add the array to the HashSet. Then as I iterated through the text file I would check it by HashSet.contains(currentWordFromFile);

Someone recommended using a HashTable to do this. Then I seen a similar example using a TreeSet. I was just curious.. what's the recommended way to do this?

(Complete code here: http://pastebin.com/GdDmCWj0 )

标签： java hashtable hashset treeset

2条回答

做个烂人

2楼-- · 2019-08-14 09:11

You said "had a homework assignment" so I'm assuming you're done with this.

I would do it a bit differently. Firstly, I think some of the keywords in your String array were incorrect. According to Wikipedia and Oracle, Java has 50 keywords. Anyway, I've commented my code fairly well. Here's what I came up with...

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.HashMap;

public class CountKeywords {

    public static void main(String args[]) {

        String[] theKeywords = { "abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "false", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "null", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", "void", "volatile", "while" };

        // put each keyword in the map with value 0 
        Map<String, Integer> theKeywordCount = new HashMap<String, Integer>();
        for (String str : theKeywords) {
            theKeywordCount.put(str, 0);
        }

        FileReader fr;
        BufferedReader br;
        File file = new File(args[0]);

        // attempt to open and read file
        try {
            fr = new FileReader(file);
            br = new BufferedReader(fr);

            String sLine;

            // read lines until reaching the end of the file
            while ((sLine = br.readLine()) != null) {

                // if an empty line was read
                if (sLine.length() != 0) {

                    // extract the words from the current line in the file
                    if (theKeywordCount.containsKey(sLine)) {
                        theKeywordCount.put(sLine, theKeywordCount.get(sLine) + 1);
                    }
                }
            }

        } catch (FileNotFoundException exception) {
            // Unable to find file.
            exception.printStackTrace();
        } catch (IOException exception) {
            // Unable to read line.
            exception.printStackTrace();
        } finally {
                br.close();
            }

        // count how many times each keyword was encontered
        int occurrences = 0;
        for (Integer i : theKeywordCount.values()) {
            occurrences += i;
        }

        System.out.println("\n\nTotal occurences in file: " + occurrences);
    }
}

Every time I encounter a keyword from the file, I first check if its in the Map; if it isn't, its not a valid keyword; if it is, then I update the value the keyword is associated with, i.e., I increment the associated Integer by 1 because we've seen this keyword once more.

Alternatively, you could get rid of that last for loop and just keep a running count, so you would instead have...

if (theKeywordCount.containsKey(sLine)) {
    occurrences++;
}

... and you print out the counter at the end.

I don't know if this is the most efficient way to do this, but I think its a solid start.

Let me know if you have any questions. I hope this helps.
Hristo

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-08-14 09:26

Try a Map<String, Integer> where the String is the word and the Integer is the number of times the word has been seen.

One benefit of this is that you do not need to process the file twice.

0人赞添加讨论(0) 举报

Most Efficient Way to Check File for List of Words

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间