Conditional sum in Python based on row input

I'm trying to do a conditional sum-product in Python. The simplified idea is as follows:

A = [1 1 2 3 3 3]
B = [0.50 0.25 0.99 0.80 0.70 0.20]

I would like to have as output

Total1 = 0.50*1 + 0.25*1
Total2 = 0.99*2
Total3 = 0.80*3 + 0.70*3 + 0.20*3

I was thinking to use a FOR ... IF... structure, to specify that for a given value in A all corresponding values in B should be summed.

In reality it's a huge dataset, so I will have to make the script capable to loop through all categories?

At this moment I'm struggling to get the idea translated to an appropriate Python script. Can somebody point me to the right direction?

标签： python python-3.x sum conditional

4条回答

Viruses.

2楼-- · 2019-05-07 00:56

I think you can solve this using itertools.groupby:

import itertools
from operator import itemgetter

results = [group * sum(v[1] for v in values)
           for group, values in itertools.groupby(zip(A, B), itemgetter(0))]

This assumes that all the equal numbers in A are adjacent to one another. If they might not be, you'd either need to sort them or use a different algorithm.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-05-07 00:58

If you don't mind using numpy for this and assuming that the groups are ordered, you can do it by:

A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
A = np.asarray([1, 1, 2, 3, 3, 3])
B = np.asarray([0.50, 0.25, 0.99, 0.80, 0.70, 0.20])
index = np.full(len(A),True)
index[:-1] = A[1:] != A[:-1]
prods = A*B

#result
res = np.add.reduceat(prods, np.append([0], (np.where(index)[0]+1)[:-1]))

Additionally, given you have large lists, this could really speed up operations

0人赞添加讨论(0) 举报

Lonely孤独者°

4楼-- · 2019-05-07 01:11

That seems like an excellent fit for itertools.groupby (assuming the values in A are sorted, it probably wouldn't work correctly for A=[1,1,2,2,1]):

from itertools import groupby
A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    print(key, key * sum(grp))

which prints:

1 0.75
2 1.98
3 5.1

You could also store it in a list instead of printing the values:

res = []
for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    res.append(key*sum(grp))
print(res)
# [0.75, 1.98, 5.1]

In case a 3rd party package might be an option for you, you could also use iteration_utilities.groupedby:

>>> from iteration_utilities import groupedby
>>> from operator import itemgetter, add

>>> {key: key*sum(value) for key, value in groupedby(zip(A, B), key=itemgetter(0), keep=itemgetter(1)).items()}
{1: 0.75, 2: 1.98, 3: 5.1}

or using the reduce parameter of groupedby directly:

>>> groupedby(zip(A, B), key=itemgetter(0), keep=lambda x: x[0]*x[1], reduce=add)
{1: 0.75, 2: 1.98, 3: 5.1}

Disclaimer: I'm the author of the iteration_utilities package.

0人赞添加讨论(0) 举报

Juvenile、少年°

5楼-- · 2019-05-07 01:15

I've come up with something like this. There is edge case I have no idea what to do with and which hopefully could be removed:

In [1]: sums = {}
In [2]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
In [3]: for count, item in zip(A, B):
    ...:     try:
    ...:         sums[count] += item * count
    ...:     except KeyError:
    ...:         sums[count] = item * count
    ...:         

In [4]: sums
Out[5]: {1: 0.75, 2: 1.98, 3: 5.1}

Edit:

As suggested in comments deafultdict could be used to get rid of this ugly try-except block:

In [2]: from collections import defaultdict

In [3]: sum = defaultdict(lambda: 0)

In [4]: sum[1]
Out[4]: 0

In [5]: sum
Out[5]: defaultdict(<function __main__.<lambda>>, {1: 0})

EDIT2:

Well, I've learned something today. After more comments:

In [6]: sums = defaultdict(int)

In [7]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

In [8]: for count, item in zip(A, B):
   ...:     sums[count] += count * item
   ...:     

In [9]: sums
Out[9]: defaultdict(int, {1: 0.75, 2: 1.98, 3: 5.1})

0人赞添加讨论(0) 举报

Conditional sum in Python based on row input

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间