Conditional sum in Python based on row input

2019-05-07 00:44发布

I'm trying to do a conditional sum-product in Python. The simplified idea is as follows:

A = [1 1 2 3 3 3]
B = [0.50 0.25 0.99 0.80 0.70 0.20]

I would like to have as output

Total1 = 0.50*1 + 0.25*1
Total2 = 0.99*2
Total3 = 0.80*3 + 0.70*3 + 0.20*3

I was thinking to use a FOR ... IF... structure, to specify that for a given value in A all corresponding values in B should be summed.

In reality it's a huge dataset, so I will have to make the script capable to loop through all categories?

At this moment I'm struggling to get the idea translated to an appropriate Python script. Can somebody point me to the right direction?

4条回答
Viruses.
2楼-- · 2019-05-07 00:56

I think you can solve this using itertools.groupby:

import itertools
from operator import itemgetter

results = [group * sum(v[1] for v in values)
           for group, values in itertools.groupby(zip(A, B), itemgetter(0))]

This assumes that all the equal numbers in A are adjacent to one another. If they might not be, you'd either need to sort them or use a different algorithm.

查看更多
叛逆
3楼-- · 2019-05-07 00:58

If you don't mind using numpy for this and assuming that the groups are ordered, you can do it by:

A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
A = np.asarray([1, 1, 2, 3, 3, 3])
B = np.asarray([0.50, 0.25, 0.99, 0.80, 0.70, 0.20])
index = np.full(len(A),True)
index[:-1] = A[1:] != A[:-1]
prods = A*B

#result
res = np.add.reduceat(prods, np.append([0], (np.where(index)[0]+1)[:-1]))

Additionally, given you have large lists, this could really speed up operations

查看更多
Lonely孤独者°
4楼-- · 2019-05-07 01:11

That seems like an excellent fit for itertools.groupby (assuming the values in A are sorted, it probably wouldn't work correctly for A=[1,1,2,2,1]):

from itertools import groupby
A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    print(key, key * sum(grp))

which prints:

1 0.75
2 1.98
3 5.1

You could also store it in a list instead of printing the values:

res = []
for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    res.append(key*sum(grp))
print(res)
# [0.75, 1.98, 5.1]

In case a 3rd party package might be an option for you, you could also use iteration_utilities.groupedby:

>>> from iteration_utilities import groupedby
>>> from operator import itemgetter, add

>>> {key: key*sum(value) for key, value in groupedby(zip(A, B), key=itemgetter(0), keep=itemgetter(1)).items()}
{1: 0.75, 2: 1.98, 3: 5.1}

or using the reduce parameter of groupedby directly:

>>> groupedby(zip(A, B), key=itemgetter(0), keep=lambda x: x[0]*x[1], reduce=add)
{1: 0.75, 2: 1.98, 3: 5.1}

Disclaimer: I'm the author of the iteration_utilities package.

查看更多
Juvenile、少年°
5楼-- · 2019-05-07 01:15

I've come up with something like this. There is edge case I have no idea what to do with and which hopefully could be removed:

In [1]: sums = {}
In [2]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
In [3]: for count, item in zip(A, B):
    ...:     try:
    ...:         sums[count] += item * count
    ...:     except KeyError:
    ...:         sums[count] = item * count
    ...:         

In [4]: sums
Out[5]: {1: 0.75, 2: 1.98, 3: 5.1}

Edit:

As suggested in comments deafultdict could be used to get rid of this ugly try-except block:

In [2]: from collections import defaultdict

In [3]: sum = defaultdict(lambda: 0)

In [4]: sum[1]
Out[4]: 0

In [5]: sum
Out[5]: defaultdict(<function __main__.<lambda>>, {1: 0})

EDIT2:

Well, I've learned something today. After more comments:

In [6]: sums = defaultdict(int)

In [7]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

In [8]: for count, item in zip(A, B):
   ...:     sums[count] += count * item
   ...:     

In [9]: sums
Out[9]: defaultdict(int, {1: 0.75, 2: 1.98, 3: 5.1})
查看更多
登录 后发表回答