item frequency count in python

2019-01-01 15:25发布

问题:

I\'m a python newbie, so maybe my question is very noob. Assume I have a list of words, and I want to find the number of times each word appears in that list. Obvious way to do this is:

words = \"apple banana apple strawberry banana lemon\"
uniques = set(words.split())
freqs = [(item, words.split.count(item)) for item in uniques]
print(freqs)

But I find this code not very good, because this way program runs through words list twice, once to build the set, and second time counting the number of appearances. Of course, I could write a function to run through list and do the counting, but that wouldn\'t be so pythonic. So, is there a more efficient and pythonic way?

回答1:

defaultdict to the rescue!

from collections import defaultdict

words = \"apple banana apple strawberry banana lemon\"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).



回答2:

If you are using python 2.7+/3.1+, there is a Counter Class in the collections module which is purpose built to solve this type of problem:

>>> from collections import Counter
>>> words = \"apple banana apple strawberry banana lemon\"
>>> freqs = Counter(words.split())
>>> print(freqs)
Counter({\'apple\': 2, \'banana\': 2, \'strawberry\': 1, \'lemon\': 1})
>>> 

Since both 2.7 and 3.1 are still in beta it\'s unlikely you\'re using it, so just keep in mind that a standard way of doing this kind of work will soon be readily available.



回答3:

Standard approach:

from collections import defaultdict

words = \"apple banana apple strawberry banana lemon\"
words = words.split()
result = collections.defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = \"apple banana apple strawberry banana lemon\"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result


回答4:

freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych\'s solution, but without importing collections. Also a bit like Selinap\'s solution, but more readable imho. Almost identical to Thomas Weigel\'s solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.



回答5:

If you don\'t want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # [\'apple\', \'banana\', \'apple\', \'strawberry\', \'banana\', \'lemon\']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[(\'apple\', 2), (\'banana\', 2), (\'lemon\', 1), (\'strawberry\', 1)]

It runs in O(n log n) time.



回答6:

Without defaultdict:

words = \"apple banana apple strawberry banana lemon\"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1


回答7:

Can\'t you just use count?

words = \'the quick brown fox jumps over the lazy gray dog\'
words.count(\'z\')
#output: 1


回答8:

The answer below takes some extra cycles, but it is another method

def func(tup):
    return tup[-1]


def print_words(filename):
    f = open(\"small.txt\",\'r\')
    whole_content = (f.read()).lower()
    print whole_content
    list_content = whole_content.split()
    dict = {}
    for one_word in list_content:
        dict[one_word] = 0
    for one_word in list_content:
        dict[one_word] += 1
    print dict.items()
    print sorted(dict.items(),key=func)


回答9:

I happened to work on some Spark exercise, here is my solution.

tokens = [\'quick\', \'brown\', \'fox\', \'jumps\', \'lazy\', \'dog\']

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**#output of the above **

{\'brown\': 0.16666666666666666, \'lazy\': 0.16666666666666666, \'jumps\': 0.16666666666666666, \'fox\': 0.16666666666666666, \'dog\': 0.16666666666666666, \'quick\': 0.16666666666666666}


回答10:

Use reduce() to convert the list to a single dict.

words = \"apple banana apple strawberry banana lemon\"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

{\'strawberry\': 1, \'lemon\': 1, \'apple\': 2, \'banana\': 2}


回答11:

words = \"apple banana apple strawberry banana lemon\"
w=words.split()
e=list(set(w))       
for i in e:
   print(w.count(i))    #Prints frequency of every word in the list

Hope this helps!