Python list grouping and sum

2020-07-23 07:14发布

问题:

Suppose I have the following list:

[{'name': 'Amy', 'count': 1}, {'name': 'Amy', 'count': 2}, {'name': 'Peter', 'count': 1}]

How could I group it and sum the count in order to get the following out:

[{'name': 'Amy', 'count': 3}, {'name': 'Peter', 'count': 1}]

Thanks.

回答1:

You can use a collecions.Counter:

from collections import Counter
l = [
    {'name': 'Amy', 'count': 1},
    {'name': 'Amy', 'count': 2}, 
    {'name': 'Peter', 'count': 1}
]
c = Counter()
for v in l:
    c[v['name']] += v['count']

Result:

>>> c
Counter({'Amy': 3, 'Peter': 1})
>>> [{'name': name, 'count': count} for name, count in c.items()]
[{'count': 3, 'name': 'Amy'}, {'count': 1, 'name': 'Peter'}]


回答2:

You can alternatively use Pandas groupby function:

df = pd.DataFrame([{'name': 'Amy', 'count': 1},
                   {'name': 'Amy', 'count': 2},
                   {'name': 'Peter', 'count': 1}])

df.groupby("name").sum()

       count
name        
Amy        3
Peter      1


回答3:

You could pivot the list using a defaultdict as explained in the doc:

>>> l = [{'name': 'Amy', 'count': 1},
         {'name': 'Amy', 'count': 2},
         {'name': 'Peter', 'count': 1}]

# Pivot operation
>>> pivot = collections.defaultdict(list)
>>> for item in l:
...     pivot[item['name']].append(item['count'])
... 
>>> pivot
defaultdict(<class 'list'>, {'Peter': [1], 'Amy': [1, 2]})

After that, you simply have to rebuild our desired output using a comprehension list:

>>> [{'name':k, 'count':sum(values)} for k, values in pivot.items()]
[{'name': 'Peter', 'count': 1}, {'name': 'Amy', 'count': 3}]

I must admit this is not necessary the most efficient way of doing, but given your data-structure, I guess the pivot operation would be useful in several other scenarios, not necessary implying summing things.



回答4:

I wanted to suggest that you could use a defaultdict as has Sylvain Leroux in his answer.

However, it is not necessary to collect the counts into a list, you can sum them as you go using a defaultdict(int):

from collections import defaultdict

l = [{'name': 'Amy', 'count': 1}, {'name': 'Amy', 'count': 2}, {'name': 'Peter', 'count': 1}]

counts = defaultdict(int)
for d in l:
    counts[d['name']] += d['count']

counts = [{'name': k, 'count': v} for k,v in counts.items()]

>>> print counts
[{'count': 3, 'name': 'Amy'}, {'count': 1, 'name': 'Peter'}]

This should be more efficient than building lists and summing them.

itertools.groupby is another option, but it does require an upfront list sort by the name key which might be less efficient on longer lists.



回答5:

import itertools as it
import operator as op

l = [{'name': 'Amy', 'count': 1}, {'name': 'Amy', 'count': 2}, {'name': 'Peter', 'count': 1}]

Get the list sorted by 'name' key of the dict.

sl = sorted(l,key=op.itemgetter('name'))  

Pass the sorted list to gorupby with the key as the 'name' key of the dict which returns a tuple of key and an iterator of list items grouped by 'name' key of the dict. f.e. ('Amy',<itertools._grouper object at 0xb5fdac2c>).

The iterator yields one item per iteration all the elements of the list which has 'Amy' as the value for 'name' key of the dict.

To get the total of the 'count' key, we have to call sum with new list of all the 'count' fields like sum(map(op.itemgetter('count'),g)).

To build a list of dict call dict with fitst element of tuple returned by the groupby as a value for 'name' key and the sum returned by sum as value for 'count' key for the new dict.

[ dict(name=k,count=sum(map(op.itemgetter('count'),g))) 
    for k,g in it.groupby(sl, key=op.itemgetter('name'))]