可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to do a conditional sum-product in Python. The simplified idea is as follows:

A = [1 1 2 3 3 3]
B = [0.50 0.25 0.99 0.80 0.70 0.20]

I would like to have as output

Total1 = 0.50*1 + 0.25*1
Total2 = 0.99*2
Total3 = 0.80*3 + 0.70*3 + 0.20*3

I was thinking to use a FOR ... IF... structure, to specify that for a given value in A all corresponding values in B should be summed.

In reality it's a huge dataset, so I will have to make the script capable to loop through all categories?

At this moment I'm struggling to get the idea translated to an appropriate Python script. Can somebody point me to the right direction?

回答1:

That seems like an excellent fit for itertools.groupby (assuming the values in A are sorted, it probably wouldn't work correctly for A=[1,1,2,2,1]):

from itertools import groupby
A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    print(key, key * sum(grp))

which prints:

1 0.75
2 1.98
3 5.1

You could also store it in a list instead of printing the values:

res = []
for key, grp in groupby(zip(A, B), key=lambda x: x[0]):
    grp = [i[1] for i in grp]
    res.append(key*sum(grp))
print(res)
# [0.75, 1.98, 5.1]

In case a 3rd party package might be an option for you, you could also use iteration_utilities.groupedby:

>>> from iteration_utilities import groupedby
>>> from operator import itemgetter, add

>>> {key: key*sum(value) for key, value in groupedby(zip(A, B), key=itemgetter(0), keep=itemgetter(1)).items()}
{1: 0.75, 2: 1.98, 3: 5.1}

or using the reduce parameter of groupedby directly:

>>> groupedby(zip(A, B), key=itemgetter(0), keep=lambda x: x[0]*x[1], reduce=add)
{1: 0.75, 2: 1.98, 3: 5.1}

Disclaimer: I'm the author of the iteration_utilities package.

回答2:

I've come up with something like this. There is edge case I have no idea what to do with and which hopefully could be removed:

In [1]: sums = {}
In [2]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
In [3]: for count, item in zip(A, B):
    ...:     try:
    ...:         sums[count] += item * count
    ...:     except KeyError:
    ...:         sums[count] = item * count
    ...:         

In [4]: sums
Out[5]: {1: 0.75, 2: 1.98, 3: 5.1}

Edit:

As suggested in comments deafultdict could be used to get rid of this ugly try-except block:

In [2]: from collections import defaultdict

In [3]: sum = defaultdict(lambda: 0)

In [4]: sum[1]
Out[4]: 0

In [5]: sum
Out[5]: defaultdict(<function __main__.<lambda>>, {1: 0})

EDIT2:

Well, I've learned something today. After more comments:

In [6]: sums = defaultdict(int)

In [7]: A = [1, 1, 2, 3, 3, 3]
   ...: B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]

In [8]: for count, item in zip(A, B):
   ...:     sums[count] += count * item
   ...:     

In [9]: sums
Out[9]: defaultdict(int, {1: 0.75, 2: 1.98, 3: 5.1})

回答3:

I think you can solve this using itertools.groupby:

import itertools
from operator import itemgetter

results = [group * sum(v[1] for v in values)
           for group, values in itertools.groupby(zip(A, B), itemgetter(0))]

This assumes that all the equal numbers in A are adjacent to one another. If they might not be, you'd either need to sort them or use a different algorithm.

回答4:

If you don't mind using numpy for this and assuming that the groups are ordered, you can do it by:

A = [1, 1, 2, 3, 3, 3]
B = [0.50, 0.25, 0.99, 0.80, 0.70, 0.20]
A = np.asarray([1, 1, 2, 3, 3, 3])
B = np.asarray([0.50, 0.25, 0.99, 0.80, 0.70, 0.20])
index = np.full(len(A),True)
index[:-1] = A[1:] != A[:-1]
prods = A*B

#result
res = np.add.reduceat(prods, np.append([0], (np.where(index)[0]+1)[:-1]))

Additionally, given you have large lists, this could really speed up operations