I'm working on a homework problem which requires me to read in words from an input file, and an integer k. The solution needs to print out a list of words and their frequencies, ranging from the most frequent to the k-th most frequent. If the number of unique words is smaller than k then only output that number of words.
This would have been cake with containers like map, but the problem constrains me to be able to use vectors and strings only and no other STL containers.
I'm stuck at the point where I have a list of all the words in a file and their corresponding frequencies. Now I need to sort them according to their frequencies and output k words.
The problem is, sorting is difficult. The frequencies can be of different digits. If I sort them using string::sort()
by padding zeros, I won't be able to know how many zeros to pad since input is unknown to the programmer.
Here's my code for the function:
void word_frequencies(ifstream& inf, int k)
{
vector <string> input;
string w;
while (inf >> w)
{
remove_punc(w);
input.push_back(w);
}
sort(input.begin(), input.end());
// initialize frequency vector
vector <int> freq;
for (size_t i = 0; i < input.size(); ++i) freq.push_back(1);
// count actual frequencies
int count = 0;
for (size_t i = 0; i < input.size()-1; ++i)
{
if (input[i] == input[i+1])
{
++count;
} else
{
freq[i] += count;
count = 0;
}
}
// words+frequencies
vector <string> wf;
for (size_t i = 0; i < freq.size()-1; ++i)
{
if (freq[i] > 1 || is_unique(input, input[i]))
{
string s = to_string(freq[i]) + " " + input[i];
wf.push_back(s);
}
}
}
Also, should I even couple the frequency with the word in the first place? I know this is messy so I'm looking for a more elegant solution.
Thanks!