可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

What is np.compress doing internally that makes it faster than boolean indexing?

In this example, compress is ~20% faster, but the time savings varies on the size of a and the number of True values in the boolean array b, but on my machine compress is always faster.

import numpy as np

a = np.random.rand(1000000,4)
b = (a[:,0]>0.5)

%timeit a[b]
#>>> 10 loops, best of 3: 24.7 ms per loop
%timeit a.compress(b, axis=0)
#>>> 10 loops, best of 3: 20 ms per loop

The documentation for boolean indexing says

What is returned is a copy of the data, not a view as one gets with slices

In contrast, the compress docs say

Return selected slices of an array along given axis".

However, using the method provided here for determining whether two arrays share the same data buffer shows that neither method shares data with its parent a, which I take to mean neither method returns an actual slice.

def get_data_base(arr):
    base = arr
    while isinstance(base.base, np.ndarray):
        base = base.base
    return base

def arrays_share_data(x, y):
    return get_data_base(x) is get_data_base(y) 

arrays_share_data(a, a.compress(b, axis=0))
#>>> False
arrays_share_data(a, a[b])
#>>> False

I am simply curious because I perform these operations frequently in my work. I run python 3.5.2, numpy v 1.11.1, installed via Anaconda.

回答1:

Tracing a.compress through several layers of function calls on the numpy github I get to

/numpy/core/src/multiarray/item_selection.c
PyArray_Compress(PyArrayObject *self, PyObject *condition, int axis,
             PyArrayObject *out)
    # various checks
    res = PyArray_Nonzero(cond);
    ret = PyArray_TakeFrom(self, PyTuple_GET_ITEM(res, 0), axis,
                       out, NPY_RAISE);

With your sample arrays, compress is the same as doing where to get a index array, and then take:

In [135]: a.shape
Out[135]: (1000000, 4)
In [136]: b.shape
Out[136]: (1000000,)
In [137]: a.compress(b, axis=0).shape
Out[137]: (499780, 4)
In [138]: a.take(np.nonzero(b)[0], axis=0).shape
Out[138]: (499780, 4)
In [139]: timeit a.compress(b, axis=0).shape
100 loops, best of 3: 14.3 ms per loop
In [140]: timeit a.take(np.nonzero(b)[0], axis=0).shape
100 loops, best of 3: 14.3 ms per loop

In fact if I use this index array in the [] indexing I get comparable times:

In [141]: idx=np.where(b)[0]
In [142]: idx.shape
Out[142]: (499780,)
In [143]: timeit a[idx,:].shape
100 loops, best of 3: 14.6 ms per loop
In [144]: timeit np.take(a,idx, axis=0).shape
100 loops, best of 3: 9.9 ms per loop

np.take code is more involved since it includes clip and wrap modes.

[] indexing gets translated into a __getitem__ call, and through various layers. I haven't traced that code vary far, but I think it's safe to say that compress (or rather take) just takes a more direct route to the task, and thus gets a modest speed increase. A speed difference of 30-50% suggests differences in compiled code details, not something major like views vs copies, or interpreted vs compiled.

回答2:

When the indices to select along one axis are specified by a vector of boolean masks, the function compress is an alternative to fancy indexing,
The noticeable speed gain is due to the axis selection being pre-specified, whereas fancy indexing can be used to make arbitrary selections of an array, thus incurs the performance penalty to make this possible.
This is also the cause to the variable speed gain you have experienced.

i = np.random.random_sample(n) < .5

b1 = a[i]
b2 = np.compress(i, a, axis=0)

%timeit a[i]
10 loops, best of 3: 59.8 ms per loop
%timeit np.compress(i, a, axis=0)
10 loops, best of 3: 24.1 ms per loop