I'm trying to do a conditional sum-product in Python. The simplified idea is as follows:
A = [1 1 2 3 3 3]
B = [0.50 0.25 0.99 0.80 0.70 0.20]
I would like to have as output
Total1 = 0.50*1 + 0.25*1
Total2 = 0.99*2
Total3 = 0.80*3 + 0.70*3 + 0.20*3
I was thinking to use a FOR ... IF... structure, to specify that for a given value in A
all corresponding values in B
should be summed.
In reality it's a huge dataset, so I will have to make the script capable to loop through all categories?
At this moment I'm struggling to get the idea translated to an appropriate Python script. Can somebody point me to the right direction?
I think you can solve this using
itertools.groupby
:This assumes that all the equal numbers in
A
are adjacent to one another. If they might not be, you'd either need to sort them or use a different algorithm.If you don't mind using numpy for this and assuming that the groups are ordered, you can do it by:
Additionally, given you have large lists, this could really speed up operations
That seems like an excellent fit for
itertools.groupby
(assuming the values inA
are sorted, it probably wouldn't work correctly forA=[1,1,2,2,1]
):which prints:
You could also store it in a list instead of printing the values:
In case a 3rd party package might be an option for you, you could also use
iteration_utilities.groupedby
:or using the
reduce
parameter ofgroupedby
directly:Disclaimer: I'm the author of the
iteration_utilities
package.I've come up with something like this. There is edge case I have no idea what to do with and which hopefully could be removed:
Edit:
As suggested in comments
deafultdict
could be used to get rid of this uglytry-except
block:EDIT2:
Well, I've learned something today. After more comments: