I have an operation that I'm doing commonly which I'm calling a "jagged-slice" because I don't know the real name for it. It's best explained by example:
a = np.random.randn(50, 10)
entries_of_interest = np.random.randint(10, size = 50) # Vector of 50 indices between 0 and 9
# Now I want the values contained in each row of a at the corresponding index in "entries of interest"
jagged_slice_of_a = a[np.arange(a.shape[0]), entries_of_interest]
# jagged_slice_of_a is now a vector with 50 elements. Good.
Only problem is it's a bit cumbersome to do this a[np.arange(a.shape[0]), entries_of_interest]
indexing (it seems silly to have to construct the "np.arange(a.shape[0])" just for the sake of this). I'd like something like the :
operator for this, but the :
does something else. Is there any more succinct way to do this operation?
Best answer:
No, there is no better way with native numpy. You can create a helper function for this if you want.
I think that your current method is probably the best way.
You can also use
choose
for this kind of selection. This is syntactically clearer, but is trickier to get right and potentially more limited. The equivalent with this method would be:The elements in
jagged_slice_of_a
are the diagonal elements ofa[:,entries_of_interest]
A slightly less cumbersome way of doing this would therefore be to use
np.diagonal
to extract them.This is combersome only in the sense that it requires more typing for a task that seems so simple to you.
But as you note, the syntactically simpler
a[:, entries_of_interest]
has another interpretation innumpy
. Choosing a subset of the columns of an array is a more common task that choosing one (random) item from each row.Your case is just a specialized instance of
where
I
andJ
are 2 arrays of the same shape. In the general caseentries_of_interest
could be smaller thana.shape[0]
(not all the rows), or larger (several items from some rows), or even be 2d. It could even select certain elements repeatedly.I have found in other SO questions that performing this kind of element selection is faster when applied to
a.flat
. But that requires some math to construct theI*n+J
kind of flat index.With your special knowledge of
J
, constructingI
seems extra work, butnumpy
can't make that kind of assumption. If this selection was more common someone could write a function that wraps your expression