Find the most common element in a list

2019-01-02 16:48发布

What is an efficient way to find the most common element in a Python list?

My list items may not be hashable so can't use a dictionary. Also in case of draws the item with the lowest index should be returned. Example:

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

标签: python list
19条回答
与君花间醉酒
2楼-- · 2019-01-02 17:25

You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)

itemList = ['hi', 'hi', 'hello', 'bye']

counter = {}
maxItemCount = 0
for item in itemList:
    try:
        # Referencing this will cause a KeyError exception
        # if it doesn't already exist
        counter[item]
        # ... meaning if we get this far it didn't happen so
        # we'll increment
        counter[item] += 1
    except KeyError:
        # If we got a KeyError we need to create the
        # dictionary key
        counter[item] = 1

    # Keep overwriting maxItemCount with the latest number,
    # if it's higher than the existing itemCount
    if counter[item] > maxItemCount:
        maxItemCount = counter[item]
        mostPopularItem = item

print mostPopularItem
查看更多
公子世无双
3楼-- · 2019-01-02 17:26

Here:

def most_common(l):
    max = 0
    maxitem = None
    for x in set(l):
        count =  l.count(x)
        if count > max:
            max = count
            maxitem = x
    return maxitem

I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.

查看更多
萌妹纸的霸气范
4楼-- · 2019-01-02 17:29

Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.

查看更多
春风洒进眼中
5楼-- · 2019-01-02 17:30

If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.

def most_common(lst):
    cur_length = 0
    max_length = 0
    cur_i = 0
    max_i = 0
    cur_item = None
    max_item = None
    for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
        if cur_item is None or cur_item != item:
            if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
                max_length = cur_length
                max_i = cur_i
                max_item = cur_item
            cur_length = 1
            cur_i = i
            cur_item = item
        else:
            cur_length += 1
    if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
        return cur_item
    return max_item
查看更多
心情的温度
6楼-- · 2019-01-02 17:30

This is an O(n) solution.

mydict   = {}
cnt, itm = 0, ''
for item in reversed(lst):
     mydict[item] = mydict.get(item, 0) + 1
     if mydict[item] >= cnt :
         cnt, itm = mydict[item], item

print itm

(reversed is used to make sure that it returns the lowest index item)

查看更多
春风洒进眼中
7楼-- · 2019-01-02 17:30

I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.

def mostPopular(l):
    mpEl=None
    mpIndex=0
    mpCount=0
    curEl=None
    curCount=0
    for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
        curCount=curCount+1 if el==curEl else 1
        curEl=el
        if curCount>mpCount \
        or (curCount==mpCount and i<mpIndex):
            mpEl=curEl
            mpIndex=i
            mpCount=curCount
    return mpEl, mpCount, mpIndex

I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.

查看更多
登录 后发表回答