2D Numpy Array Fancy Indexing + Masking

2019-09-12 17:45发布

问题:

I have:

import numpy as np
a = np.array([[ 4, 99,  2],
              [ 3,  4, 99],
              [ 1,  8,  7],
              [ 8,  6,  8]])

Why is

a[[True, True, False, False], [1,2]]

Equal to

array([99, 99])

And not

array([99, 2],
      [4, 99])

Since I am selecting the first two rows using a boolean mask and the 2nd and 3rd columns using fancy indexing? Especially since calling

a[[True, True, False, False],:][:, [1,2]]

gives me my expected result. Im guessing its some sort of broadcasting rule but it isn't apparent to me. Thanks!

回答1:

A boolean array or list evaluates as though where had converted it to an index array:

In [285]: a[[True,True,False,False],[1,2]]
Out[285]: array([99, 99])

In [286]: a[np.where([True,True,False,False]),[1,2]]
Out[286]: array([[99, 99]])

In [287]: np.where([True,True,False,False])
Out[287]: (array([0, 1], dtype=int32),)

In [288]: a[[0,1], [1,2]]
Out[288]: array([99, 99])

So this is picking a[0,1] and a[1,2], a 'pair-wise' selection.

The block is indexed with arrays (or list equivalents) that broadcast against each other to produce a (2,2) array:

In [289]: a[np.ix_([0,1], [1,2])]
Out[289]: 
array([[99,  2],
       [ 4, 99]])
In [290]: a[[[0],[1]], [1,2]]
Out[290]: 
array([[99,  2],
       [ 4, 99]])

This case is equivalent to a 2 stage indexing: a[[0,1],:][:,[1,2]]

I'm using np version 12. There have been some changes in boolean index over the recent releases. For example, if the length of the boolean isn't right, it runs, but gives a warning (this part is new).

In [349]: a[[True,True,False],[1,2]]
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 3
  #!/usr/bin/python3
Out[349]: array([99, 99])

Changes for v 13 are described in:

https://docs.scipy.org/doc/numpy-dev/release.html#boolean-indexing-changes



回答2:

I think it works like the following:

In [284]: a
Out[284]: 
array([[ 4, 99,  2],
       [ 3,  4, 99],
       [ 1,  8,  7],
       [ 8,  6,  8]])

In [286]: bo
Out[286]: array([ True,  True, False, False], dtype=bool)

In [287]: boc
Out[287]: array([1, 2])

Now, once we index a with the boolean mask bo, we get:

In [285]: a[bo]
Out[285]: 
array([[ 4, 99,  2],
       [ 3,  4, 99]])

Since, bo evaluates to [1, 1, 0, 0], this will just select first two rows of a.

Now, we apply boc i.e. [1, 2] in combination with the row selecting mask bo.

In [288]: a[bo, boc]
Out[288]: array([99, 99])

Here, the mask boc is applied to the already fetched rows. And it selects second element from first row, third element from second row yielding [99, 99].

But, interestingly if you do something like:

In [289]: a[1, [1, 2]]
Out[289]: array([ 4, 99])

In this case, numpy broadcasts yielding the indices [(1,1), (1,2)]