Get three most occuring word with their count valu

2019-07-25 15:02发布

问题:

My below code gives me most occurring word from string. I wan to get get three most occuring words from vector with their count value. Any help?

I have used vector and unordered_map. In last portion of code I got most occuring word from vector.

int main(int argc,char *argv[])
    {
        typedef std::unordered_map<std::string,int> occurrences;
        occurrences s1;
        std::string input = argv[1];

        std::istringstream iss(std::move(input));
        std::vector<std::string> most;
        int max_count = 0,second=0,third=0;


//Here I get max_count, 2nd highest and 3rd highest count value 
       while (iss >> input)
        {
            int tmp = ++s1[input];
            if (tmp == max_count)
            {
                most.push_back(input);
            }
            else if (tmp > max_count)
            {
                max_count = tmp;
                most.clear();
                most.push_back(input);
                third = second;
                second = max_count;
            }
            else if (tmp > second)
            {
                third = second;
                second = tmp;
            }
            else if (tmp > third)
            {
                third = tmp;
            }
        }

//I have not used max_count, second, third below. I dont know how to access them for my purpose

      //Print each word with it's occurenece. This works fine 
      for (occurrences::const_iterator it = s1.cbegin();it != s1.cend(); ++it)
            std::cout << it->first << " : " << it->second << std::endl;;

      //Prints word which occurs max time. **Here I want to print 1st highest,2nd highest,3rd highest occuring word with there occurrence.  How to do?**
      std::cout << std::endl << "Maximum Occurrences" << std::endl;
        for (std::vector<std::string>::const_iterator it = most.cbegin(); it != most.cend(); ++it)
            std::cout << *it << std::endl;

       return 0;
    } 

Any idea to get 3 most occuring word?

回答1:

I'd prefer to use a std::map<std::string, int> instead

Use this as a source map, insert values from a std::vector<std::string>

Now create multimap, a flip version of source map with std::greater<int> as Comparator

This final map has top three value as most frequent used words

Example :

#include<iostream>
#include<algorithm>
#include<map>
#include<vector>

int main()
{
 std::vector<std::string> most { "lion","tiger","kangaroo",
                                 "donkey","lion","tiger",
                                 "lion","donkey","tiger"
                                 };
std::map<std::string, int> src;
for(auto x:most)
    ++src[x];

std::multimap<int,std::string,std::greater<int> > dst;

std::transform(src.begin(), src.end(), std::inserter(dst, dst.begin()), 
                   [] (const std::pair<std::string,int> &p) {
                   return std::pair<int,std::string>(p.second, p.first);
                   }
                 );

std::multimap<int,std::string>::iterator it = dst.begin();

 for(int count = 0;count<3 && it !=dst.end();++it,++count)
   std::cout<<it->second<<":"<<it->first<<std::endl;

}

DEMO HERE



回答2:

It is easier and cleaner to use a heap to store the three most occuring words. It also is easily extensible to a larger number of most occuring words.



回答3:

If I wanted to know the n most occurring words, I'd have an n element array, iterate over the list of the words, and store the ones that make it into my top n into the array (dropping the lowest one).



标签: c++ vector