What is np.compress
doing internally that makes it faster than boolean indexing?
In this example, compress
is ~20% faster, but the time savings varies on the size of a
and the number of True
values in the boolean array b
, but on my machine compress
is always faster.
import numpy as np
a = np.random.rand(1000000,4)
b = (a[:,0]>0.5)
%timeit a[b]
#>>> 10 loops, best of 3: 24.7 ms per loop
%timeit a.compress(b, axis=0)
#>>> 10 loops, best of 3: 20 ms per loop
The documentation for boolean indexing says
What is returned is a copy of the data, not a view as one gets with slices
In contrast, the compress docs say
Return selected slices of an array along given axis".
However, using the method provided here for determining whether two arrays share the same data buffer shows that neither method shares data with its parent a
, which I take to mean neither method returns an actual slice.
def get_data_base(arr):
base = arr
while isinstance(base.base, np.ndarray):
base = base.base
return base
def arrays_share_data(x, y):
return get_data_base(x) is get_data_base(y)
arrays_share_data(a, a.compress(b, axis=0))
#>>> False
arrays_share_data(a, a[b])
#>>> False
I am simply curious because I perform these operations frequently in my work. I run python 3.5.2, numpy v 1.11.1, installed via Anaconda.
Tracing a.compress
through several layers of function calls on the numpy
github
I get to
/numpy/core/src/multiarray/item_selection.c
PyArray_Compress(PyArrayObject *self, PyObject *condition, int axis,
PyArrayObject *out)
# various checks
res = PyArray_Nonzero(cond);
ret = PyArray_TakeFrom(self, PyTuple_GET_ITEM(res, 0), axis,
out, NPY_RAISE);
With your sample arrays, compress
is the same as doing where
to get a index array, and then take
:
In [135]: a.shape
Out[135]: (1000000, 4)
In [136]: b.shape
Out[136]: (1000000,)
In [137]: a.compress(b, axis=0).shape
Out[137]: (499780, 4)
In [138]: a.take(np.nonzero(b)[0], axis=0).shape
Out[138]: (499780, 4)
In [139]: timeit a.compress(b, axis=0).shape
100 loops, best of 3: 14.3 ms per loop
In [140]: timeit a.take(np.nonzero(b)[0], axis=0).shape
100 loops, best of 3: 14.3 ms per loop
In fact if I use this index array in the [] indexing I get comparable times:
In [141]: idx=np.where(b)[0]
In [142]: idx.shape
Out[142]: (499780,)
In [143]: timeit a[idx,:].shape
100 loops, best of 3: 14.6 ms per loop
In [144]: timeit np.take(a,idx, axis=0).shape
100 loops, best of 3: 9.9 ms per loop
np.take
code is more involved since it includes clip
and wrap
modes.
[] indexing gets translated into a __getitem__
call, and through various layers. I haven't traced that code vary far, but I think it's safe to say that compress
(or rather take
) just takes a more direct route to the task, and thus gets a modest speed increase. A speed difference of 30-50% suggests differences in compiled code details, not something major like views
vs copies
, or interpreted vs compiled.
When the indices to select along one axis are specified by a vector of boolean masks, the function compress is an alternative to fancy indexing,
The noticeable speed gain is due to the axis selection being pre-specified, whereas fancy indexing can be used to make arbitrary selections of an array, thus incurs the performance penalty to make this possible.
This is also the cause to the variable speed gain you have experienced.
i = np.random.random_sample(n) < .5
b1 = a[i]
b2 = np.compress(i, a, axis=0)
%timeit a[i]
10 loops, best of 3: 59.8 ms per loop
%timeit np.compress(i, a, axis=0)
10 loops, best of 3: 24.1 ms per loop