Given the following list
['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats',
'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and',
'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.',
'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats',
'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise',
'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle',
'Moon', 'to', 'rise.', '']
I am trying to count how many times each word appears and display the top 3.
However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.
I am sure there is a better way than this, but my idea was to do the following:
- put the first word in the list into another list called uniquewords
- delete the first word and all its duplicated from the original list
- add the new first word into unique words
- delete the first word and all its duplicated from original list.
- etc...
- until the original list is empty....
- count how many times each word in uniquewords appears in the original list
- find top 3 and print
The simple way of doing this would be (assuming your list is in 'l'):
Complete sample:
With simple I mean working in nearly every version of python.
if you don't understand some of the functions used in this sample, you can always do this in the interpreter (after pasting the code above):
If you are using Count, or have created your own Count-style dict and want to show the name of the item and the count of it, you can iterate around the dictionary like so:
or to iterate through this in a template:
Hope this helps someone
If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a
dict
.Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.
nltk is convenient for a lot of language processing stuff. It has methods for frequency distribution built in. Something like:
Is't it just this ....
Which should output
[('Jellicle', 6), ('Cats', 5), ('are', 3)]
A simple, two-line solution to this, which does not require any extra modules is the following code:
Output:
The term in squared brackets returns all unique strings in the list, which are not empty and start with a capital letter. The
sorted()
function then sorts them by how often they appear in the list (by using thelst.count
key) in reverse order.