I have an application where I need to sum across arbitrary groups of indices in a 3D NumPy array. The built-in NumPy array sum routine sums up all indices along one of the dimensions of an ndarray. Instead, I need to sum up ranges of indices along one of the dimensions in my array and return a new array.
For example, let's assume that I have an ndarray with shape (70,25,3)
. I wish to sum up the first dimension along certain index ranges and return a new 3D array. Consider the sum from 0:25, 25:50
and 50:75
which would return an array of shape (3,25,3)
.
Is there an easy way to do "disjoint sums" along one dimension of a NumPy array to produce this result?
You can use np.add.reduceat
as a general approach to this problem. This works even if the ranges are not all the same length.
To sum the slices 0:25
, 25:50
and 50:75
along axis 0, pass in indices [0, 25, 50]
:
np.add.reduceat(a, [0, 25, 50], axis=0)
This method can also be used to sum non-contiguous ranges. For instance, to sum the slices 0:25
, 37:47
and 51:75
, write:
np.add.reduceat(a, [0,25, 37,47, 51], axis=0)[::2]
An alternative approach to summing ranges of the same length is to reshape the array and then sum along an axis. The equivalent to the first example above would be:
a.reshape(3, a.shape[0]//3, a.shape[1], a.shape[2]).sum(axis=1)
Just sum each portion and use the results to create a new array.
import numpy as np
i1, i2 = (2,7)
a = np.ones((10,5,3))
b = np.sum(a[0:i1,...], 0)
c = np.sum(a[i1:i2,...], 0)
d = np.sum(a[i2:,...], 0)
g = np.array([b,c,d])
>>> g.shape
(3, 5, 3)
>>> g
array([[[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]],
[[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.]],
[[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]]])
>>>
You can use np.split
to split your array then use np.sum
to sum your items along the second axis :
np.sum(np.split(my_array,3),axis=1)
Demo:
>>> a=np.arange(270).reshape(30,3,3)
>>> np.sum(np.split(a,3),axis=1)
array([[[ 405, 415, 425],
[ 435, 445, 455],
[ 465, 475, 485]],
[[1305, 1315, 1325],
[1335, 1345, 1355],
[1365, 1375, 1385]],
[[2205, 2215, 2225],
[2235, 2245, 2255],
[2265, 2275, 2285]]])
Also note that if you have a different slice lengths you can pass the end of you slices to np.split
function :
>>> new=np.sum(np.split(a,[10,20,]),axis=1)
>>> new
array([[[ 405, 415, 425],
[ 435, 445, 455],
[ 465, 475, 485]],
[[1305, 1315, 1325],
[1335, 1345, 1355],
[1365, 1375, 1385]],
[[2205, 2215, 2225],
[2235, 2245, 2255],
[2265, 2275, 2285]]])