Given the following list
['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats',
'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and',
'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.',
'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats',
'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise',
'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle',
'Moon', 'to', 'rise.', '']
I am trying to count how many times each word appears and display the top 3.
However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.
I am sure there is a better way than this, but my idea was to do the following:
- put the first word in the list into another list called uniquewords
- delete the first word and all its duplicated from the original list
- add the new first word into unique words
- delete the first word and all its duplicated from original list.
- etc...
- until the original list is empty....
- count how many times each word in uniquewords appears in the original list
- find top 3 and print
To just return a list containing the most common words:
this prints:
the 3 in "
most_common(3)
", specifies the number of items to print.Counter(words).most_common()
returns a a list of tuples with each tuple having the word as the first member and the frequency as the second member.The tuples are ordered by the frequency of the word."the
word for word, word_counter in
", extracts only the first member of the tuple.The answer from @Mark Byers is best, but if you are on a version of Python < 2.7 (but at least 2.5, which is pretty old these days), you can replicate the Counter class functionality very simply via defaultdict (otherwise, for python < 2.5, three extra lines of code are needed before d[i] +=1, as in @Johnnysweb's answer).
Then, you use the class exactly as in Mark Byers's answer, i.e.:
In Python 2.7 and above there is a class called Counter which can help you:
Result:
You could instead do this using a dictionary with the key being a word and the value being the count for that word. First iterate over the words adding them to the dictionary if they are not present, or else increasing the count for the word if it is present. Then to find the top three you can either use a simple
O(n*log(n))
sorting algorithm and take the first three elements from the result, or you can use aO(n)
algorithm that scans the list once remembering only the top three elements.An important observation for beginners is that by using builtin classes that are designed for the purpose you can save yourself a lot of work and/or get better performance. It is good to be familiar with the standard library and the features it offers.