My code for slicing a numpy array (via fancy indexing) is very slow. It is currently a bottleneck in program.
a.shape
(3218, 6)
ts = time.time(); a[rows][:, cols]; te = time.time(); print('%.8f' % (te-ts));
0.00200009
What is the correct numpy call to get an array consisting of the subset of rows 'rows' and columns 'col' of the matrix a? (in fact, I need the transpose of this result)
Let my try to summarize the excellent answers by Jaime and TheodrosZelleke and mix in some comments.
a[rows][:,cols]
implies two fancy indexing operations, so an intermediate copya[rows]
is created and discarded. Handy and readable, but not very efficient. Moreover beware that[:,cols]
usually generates a Fortran contiguous copy form a C-cont. source.a[rows.reshape(-1,1),cols]
is a single advanced indexing expression basing on the fact thatrows.reshape(-1,1)
andcols
are broadcast to the shape of the intended result.A common experience is that indexing in a flattened array can be more efficient than fancy indexing, so another approach is
or
Efficiency will depend on memory access patterns and whether the starting array is C-countinous or Fortran continuous, so experimentation is needed.
Use fancy indexing only if really needed: basic slicing
a[rstart:rstop:rstep, cstart:cstop:cstep]
returns a view (although not continuous) and should be faster!You can get some speed up if you slice using fancy indexing and broadcasting:
If you think in term of percentages, doing something 15% faster is always good, but in my system, for the size of your array, this is taking 40 us less to do the slicing, and it is hard to believe that an operation taking 240 us will be your bottleneck.
To my surprise this, kind of lenghty expression, which calculates first linear 1D-indices, is more than 50% faster than the consecutive array indexing presented in the question:
UPDATE: OP updated the description of the shape of the initial array. With the updated size the speedup is now above 99%:
INITAL ANSWER: Here is the transcript:
Time method 1:
Time method 2:
Check that results are actually the same:
Using
np.ix_
you can a similar speed to ravel/reshape, but with code that is more clear: