Count frequency of words in a list and sort by fre

2019-01-01 15:24发布

I am using Python 3.3

I need to create two lists, one for the unique words and the other for the frequencies of the word.

I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.

I have the design in text but am uncertain how to implement it in Python.

The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.

Here's the basic design:

                 original list = ["the", "car",....]
                 newlst = []
                 frequency = []
                 for word in the original list
                       if word not in newlst
                           newlst.append(word)
                           set frequency = 1
                       else
                           increase the frequency
                 sort newlst based on frequency list 

11条回答
浅入江南
2楼-- · 2019-01-01 15:34

The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.

查看更多
唯独是你
3楼-- · 2019-01-01 15:35

Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.

# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

freq will end up with the frequency of each word in the list you already have.

You need float in there to convert one of the integers to a float, so the resulting value will be a float.

Edit:

If you can't use a dict or set, here is another less efficient way:

# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + ": " + word_frequencies[i])

The indicies of unique_words and word_frequencies will match.

查看更多
墨雨无痕
4楼-- · 2019-01-01 15:36
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word
查看更多
只若初见
5楼-- · 2019-01-01 15:37

use this

from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
查看更多
千与千寻千般痛.
6楼-- · 2019-01-01 15:37

Here is code support your question is_char() check for validate string count those strings alone, Hashmap is dictionary in python

def is_word(word):
   cnt =0
   for c in word:

      if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
          cnt +=1
   if cnt==len(word):
      return True
  return False

def words_freq(s):
  d={}
  for i in s.split():
    if is_word(i):
        if i in d:
            d[i] +=1
        else:
            d[i] = 1
   return d

 print(words_freq('the the sky$ is blue not green'))
查看更多
姐姐魅力值爆表
7楼-- · 2019-01-01 15:40

Try this:

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
查看更多
登录 后发表回答