Word Count no duplicates

2019-09-21 21:05发布

问题:

Here is my word count program using java. I need to reprogram this so that something, something; something? something! and something count as one word. That means it should not count the same word twice irregardless of case and punctuation.

import java.util.Scanner;
public class WordCount1
{
    public static void main(String[]args)
    {
        final int Lines=6;
        Scanner in=new Scanner (System.in);
        String paragraph = "";
        System.out.println( "Please input "+ Lines + " lines of text.");
        for (int i=0; i < Lines; i+=1)
        {
            paragraph=paragraph+" "+in.nextLine();
        }
        System.out.println(paragraph);
        String word="";
        int WordCount=0;
        for (int i=0; i<paragraph.length()-1; i+=1)
        {
            if (paragraph.charAt(i) != ' ' || paragraph.charAt(i) !=',' || paragraph.charAt(i)    !=';' || paragraph.charAt(i) !=':' )
            {
                word= word + paragraph.charAt(i);
                if(paragraph.charAt(i+1)==' ' || paragraph.charAt(i) ==','|| paragraph.charAt(i) ==';' || paragraph.charAt(i) ==':')
                {
                    WordCount +=1;
                    word="";
                }
            }
        }
        System.out.println("There are "+WordCount +" words ");
    }
}

回答1:

Since this is homework, here are some hints and advice.

  • There is a clever little method called String.split that splits a string into parts, using a separator specified as a regular expression. If you use it the right way, this will give you a one line solution to the "word count" problem. (If you've been told not to use split, you can ignore that ... though it is the simple solution that a seasoned Java developer would consider first.)

  • Format / indent your code properly ... before you show it to other people. If your instructor doesn't deduct marks for this, he / she isn't doing his job properly.

  • Use standard Java naming conventions. The capitalization of Lines is incorrect. It could be LINES for a manifest constant or lines for variable, but a mixed case name starting with a capital letter should always be a class name.

  • Be consistent in your use of white space characters around operators (including the assignment operator).

  • It is a bad idea (and completely unnecessary) to hard wire the number of lines of input that the user must supply. And you are not dealing with the case where he / supplies less than 6 lines.



回答2:

You should just remove punctuation and change to a single case before doing further processing. (Be careful with locales and unicode)

Once you have broken the input into words, you can count the number of unique words by passing them into a Set and checking the size of the set.



回答3:

Here You Go. This Works. Just Read The Comments And You Should Be Able To Follow.

import java.util.Arrays;
import java.util.HashSet;
import javax.swing.JOptionPane;

// Program Counts Words In A Sentence. Duplicates Are Not Counted.
public class WordCount
{
    public static void main(String[]args)
    {
        // Initialize Variables
        String sentence = "";
        int wordCount = 1, startingPoint = 0;


        // Prompt User For Sentence
        sentence = JOptionPane.showInputDialog(null, "Please input a sentence.", "Input Information Below", 2);


        // Remove All Punctuations. To Check For More Punctuations Just Add Another Replace Statement.
        sentence = sentence.replace(",", "").replace(".", "").replace("?", "");


        // Convert All Characters To Lowercase - Must Be Done To Compare Upper And Lower Case Words.
        sentence = sentence.toLowerCase();


        // Count The Number Of Words
        for (int i = 0; i < sentence.length(); i++)
            if (sentence.charAt(i) == ' ')
                wordCount++;


        // Initialize Array And A Count That Will Be Used As An Index
        String[] words = new String[wordCount];
        int count = 0;


        // Put Each Word In An Array
        for (int i = 0; i < sentence.length(); i++)
        {
            if (sentence.charAt(i) == ' ')
            {
                words[count] = sentence.substring(startingPoint,i);
                startingPoint = i + 1;
                count++;
            }
        }


        // Put Last Word In Sentence In Array
        words[wordCount - 1] = sentence.substring(startingPoint, sentence.length());


        // Put Array Elements Into A Set. This Will Remove Duplicates
        HashSet<String> wordsInSet = new HashSet<String>(Arrays.asList(words));


        // Format Words In Hash Set To Remove Brackets, And Commas, And Convert To String
        String wordsString = wordsInSet.toString().replace(",", "").replace("[", "").replace("]", "");


        // Print Out None Duplicate Words In Set And Word Count
        JOptionPane.showMessageDialog(null, "Words In Sentence:\n" + wordsString + " \n\n" +
                                                "Word Count: " + wordsInSet.size(), "Sentence Information", 2);
    }
}


回答4:

If you know the marks you want to ignore (;, ?, !) you could do a simple String.replace to remove the characters out of the word. You may want to use String.startsWith and String.endsWith to help

Convert you values to lower case for easier matching (String.toLowercase)

The use of a 'Set' is an excellent idea. If you want to know how many times a particular word appears you could also take advantage of a Map of some kind



回答5:

  1. You'll need to strip out the punctuation; here's one approach: Translating strings character by character

  2. The above can also be used to normalize the case, although there are probably other utilities for doing so.

  3. Now all of the variations you describe will be converted to the same string, and thus be recognized as such. As pretty much everyone else has suggested, as set would be a good tool for counting the number of distinct words.



回答6:

What your real problem is, is that you want to have a Distinct wordcount, so, you should either keep track of which words allready encountered, or delete them from the text entirely.

Lets say that you choose the first one, and store the words you already encountered in a List, then you can check against that list whether you allready saw that word.

List<String> encounteredWords = new ArrayList<String>();
// continue after that you found out what the word was
if(!encounteredWords.contains(word.toLowerCase()){
    encounteredWords.add(word.toLowerCase());
    wordCount++;
}

But, Antimony, made a interesting remark as well, he uses the property of a Set to see what the distinct wordcount is. It is defined that a set can never contain duplicates, so if you just add more of the same word, the set wont grow in size.

Set<String> wordSet = new HashSet<String>();
// continue after that you found out what the word was
wordSet.add(word.toLowerCase());
// continue after that you scanned trough all words
return wordSet.size();


回答7:

  1. remove all punctuations
  2. convert all strings to lowercase OR uppercase
  3. put those strings in a set
  4. get the size of the set


回答8:

As you parse your input string, store it word by word in a map data structure. Just ensure that "word", "word?" "word!" all are stored with the key "word" in the map, and increment the word's count whenever you have to add to the map.



标签: java count word