-->

Count frequency of words in a list and sort by fre

2019-01-01 15:14发布

问题:

I am using Python 3.3

I need to create two lists, one for the unique words and the other for the frequencies of the word.

I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.

I have the design in text but am uncertain how to implement it in Python.

The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.

Here\'s the basic design:

                 original list = [\"the\", \"car\",....]
                 newlst = []
                 frequency = []
                 for word in the original list
                       if word not in newlst
                           newlst.append(word)
                           set frequency = 1
                       else
                           increase the frequency
                 sort newlst based on frequency list 

回答1:

use this

from collections import Counter
list1=[\'apple\',\'egg\',\'apple\',\'banana\',\'egg\',\'apple\']
counts = Counter(list1)
print(counts)
# Counter({\'apple\': 3, \'egg\': 2, \'banana\': 1})


回答2:

You can use

from collections import Counter

It supports Python 2.7,read more information here

1.

>>>c = Counter(\'abracadabra\')
>>>c.most_common(3)
[(\'a\', 5), (\'r\', 2), (\'b\', 2)]

use dict

>>>d={1:\'one\', 2:\'one\', 3:\'two\'}
>>>c = Counter(d.values())
[(\'one\', 2), (\'two\', 1)]

But, You have to read the file first, and converted to dict.

2. it\'s the python docs example,use re and Counter

# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r\'\\w+\', open(\'hamlet.txt\').read().lower())
>>> Counter(words).most_common(10)
[(\'the\', 1143), (\'and\', 966), (\'to\', 762), (\'of\', 669), (\'i\', 631),
 (\'you\', 554),  (\'a\', 546), (\'my\', 514), (\'hamlet\', 471), (\'in\', 451)]


回答3:

words = file(\"test.txt\", \"r\").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word


回答4:

You can use reduce() - A functional way.

words = \"apple banana apple strawberry banana lemon\"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns:

{\'strawberry\': 1, \'lemon\': 1, \'apple\': 2, \'banana\': 2}


回答5:

One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:

list1 = []    #this is your original list of words
list2 = []    #this is a new list

for word in list1:
    if word in list2:
        list2.index(word)[1] += 1
    else:
        list2.append([word,0])

Or, more efficiently:

for word in list1:
    try:
        list2.index(word)[1] += 1
    except:
        list2.append([word,0])

This would be less efficient than using a dictionary, but it uses more basic concepts.



回答6:

Yet another solution with another algorithm without using collections:

def countWords(A):
   dic={}
   for x in A:
       if not x in  dic:        #Python 2.7: if not dic.has_key(x):
          dic[x] = A.count(x)
   return dic

dic = countWords([\'apple\',\'egg\',\'apple\',\'banana\',\'egg\',\'apple\'])
sorted_items=sorted(dic.items())   # if you want it sorted


回答7:

The ideal way is to use a dictionary that maps a word to it\'s count. But if you can\'t use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.



回答8:

Using Counter would be the best way, but if you don\'t want to do that, you can implement it yourself this way.

# The list you already have
word_list = [\'words\', ..., \'other\', \'words\']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

freq will end up with the frequency of each word in the list you already have.

You need float in there to convert one of the integers to a float, so the resulting value will be a float.

Edit:

If you can\'t use a dict or set, here is another less efficient way:

# The list you already have
word_list = [\'words\', ..., \'other\', \'words\']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + \": \" + word_frequencies[i])

The indicies of unique_words and word_frequencies will match.



回答9:

Try this:

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs


回答10:

Here is code support your question is_char() check for validate string count those strings alone, Hashmap is dictionary in python

def is_word(word):
   cnt =0
   for c in word:

      if \'a\' <= c <=\'z\' or \'A\' <= c <= \'Z\' or \'0\' <= c <= \'9\' or c == \'$\':
          cnt +=1
   if cnt==len(word):
      return True
  return False

def words_freq(s):
  d={}
  for i in s.split():
    if is_word(i):
        if i in d:
            d[i] +=1
        else:
            d[i] = 1
   return d

 print(words_freq(\'the the sky$ is blue not green\'))


回答11:

the best thing to do is :

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist]
    return dict(zip(wordlist, wordfreq))

then try to : wordListToFreqDict(originallist)