How can I sum across rows that have equal values in the first column of a numpy array? For example:
In: np.array([[1,2,3],
[1,4,6],
[2,3,5],
[2,6,2],
[3,4,8]])
Out: [[1,6,9], [2,9,7], [3,4,8]]
Any help would be greatly appreciated.
Pandas has a very very powerful groupby function which makes this very simple.
With a little help from your friends
np.unique
andnp.add.at
:Approach #1
Here's something in a numpythonic vectorized way based on
np.bincount
-Sample input, output -
Approach #2
Here's another based on
np.cumsum
andnp.diff
-Benchmarking
Here's some runtime tests for the numpy based approaches presented so far for the question -
Seems like
Approach #2: cumsum + diff
is performing quite well.Try using pandas. Group by the first column and then sum rowwise. Something like