Given an array 'array' and a set of indices 'indices', how do I find the cumulative sum of the sub-arrays formed by splitting the array along those indices in a vectorized manner? To clarify, suppose I have:
>>> array = np.arange(20)
>>> array
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
indices = np.arrray([3, 8, 14])
The operation should output:
array([0, 1, 3, 3, 7, 12, 18, 25, 8, 17, 27, 38, 50, 63, 14, 29, 45, 62, 80, 99])
Please note that the array is very big (100000 elements) and so, I need a vectorized answer. Using any loops would slow it down considerably. Also, if I had the same problem, but a 2D array and corresponding indices, and I need to do the same thing for each row in the array, how would I do it?
For the 2D version:
>>>array = np.arange(12).reshape((3,4))
>>>array
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> indices = np.array([[2], [1, 3], [1, 2]])
The output would be:
array([[ 0, 1, 3, 3],
[ 4, 9, 6, 13],
[ 8, 17, 10, 11]])
To clarify: Every row will be split.
You can use
np.split
to split your array along the indices then using python built in functionmap
apply thenp.cumsum()
to your sub arrays. And at the end by usingnp.hstack
convert the result to an integrated array:Note that since
map
is a built in function in python and has been implemented in C inside the Python interpreter it would performs better than a regular loop.1Here is an alternative for 2D arrays:
Note that your expected output is not based on the way that
np.split
works.If you want to such results you need to add 1 to your indices :
Due to a comment which said there is not performance difference between using generator expression and
map
function I ran a benchmark which demonstrates result better.Note that this doesn't mean that using map which performs in C speed makes that code preforms in C speed. That's because of that, the code has implemented in python and calling the function (first argument) and applying it on iterable items would take time.
You can introduce differentiation of originally cumulatively summed array at
indices
positions to create a boundary like effect at those places, such that when the differentiated array is cumulatively summed, gives us the indices-stopped cumulatively summed output. This might feel a bit contrived at first-look, but stick with it, try with other samples and hopefully would make sense! The idea is very similar to the one applied inthis other MATLAB solution.
So, following such a philosophy here's one approach usingnumpy.diff
along withcumulative summation
-This should be an almost vectorized solution, almost because even though we are calculating linear indices in a loop, but since it's not the computationally intensive part here, so it's effect on the total runtime would be minimal. Also, you can replace
arrayc
witharray
if you don't care about destructing the input for saving on memory.Sample input, output -