When does getting submatrix of an numpy array retu

2020-07-29 18:01发布

问题:

I'm trying to get a submatrix of a numpy 2D array, and modify it. Sometimes I get a copy, which modification to it does not affect the original array:

In [650]: d=np.random.rand(5,5)

In [651]: may_share_memory(d, d[[0,1],:][:,[2,3]])
Out[651]: False

In [652]: d[[0,1],:][:,[2,3]]=2

In [653]: d
Out[653]: 
array([[ 0.0648922 ,  0.41408311,  0.88024646,  0.22471181,  0.81811439],
       [ 0.32154096,  0.88349028,  0.30755883,  0.55301128,  0.61138144],
       [ 0.18398833,  0.40208368,  0.69888324,  0.93197147,  0.43538379],
       [ 0.55633382,  0.80531999,  0.71486132,  0.4186339 ,  0.76487239],
       [ 0.81193408,  0.4951559 ,  0.97713937,  0.33904998,  0.27660239]])

while sometimes it seems I get a view, although may_share_memory also returns False:

In [662]: d[np.ix_([0,1],[2,3])]=1

In [663]: d
Out[663]: 
array([[ 0.0648922 ,  0.41408311,  1.        ,  1.        ,  0.81811439],
       [ 0.32154096,  0.88349028,  1.        ,  1.        ,  0.61138144],
       [ 0.18398833,  0.40208368,  0.69888324,  0.93197147,  0.43538379],
       [ 0.55633382,  0.80531999,  0.71486132,  0.4186339 ,  0.76487239],
       [ 0.81193408,  0.4951559 ,  0.97713937,  0.33904998,  0.27660239]])

In [664]: may_share_memory(d, d[np.ix_([0,1],[2,3])])
Out[664]: False

What more strange is, if assign that 'view' to a variable, it becomes a 'copy' (again, modification does not affect the original array):

In [658]: d2=d[np.ix_([0,1],[2,3])]

In [659]: may_share_memory(d,d2)
Out[659]: False

In [660]: d2+=1

In [661]: d
Out[661]: 
array([[ 0.0648922 ,  0.41408311,  0.88024646,  0.22471181,  0.81811439],
       [ 0.32154096,  0.88349028,  0.30755883,  0.55301128,  0.61138144],
       [ 0.18398833,  0.40208368,  0.69888324,  0.93197147,  0.43538379],
       [ 0.55633382,  0.80531999,  0.71486132,  0.4186339 ,  0.76487239],
       [ 0.81193408,  0.4951559 ,  0.97713937,  0.33904998,  0.27660239]])

回答1:

What you're seeing is the difference between "fancy" indexing and normal indexing.

Also, for clarity, d[np.ix_([0,1],[2,3])] = 1 is not a view, it's an assignment. See @EelcoHoogendoorn's answer for more explanation in that regard. The root of your confusion seems to be with __setitem__ vs __getitem__, which Eelco addresses, but I thought I'd add a few numpy-specific clarifications.

Any time you index with a sequence of coordinates (np.ix_ returns an array of indicies), it's "fancy" indexing and will always return a copy.

Anything you can do with slicing with always return a view.

For example:

In [1]: import numpy as np

In [2]: x = np.arange(10)

In [3]: y = x[3:5]

In [4]: z = x[[3, 4]]

In [5]: z[0] = 100

In [5]: x
Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]: y[0] = 100

In [7]: x
Out[7]: array([  0,   1,   2, 100,   4,   5,   6,   7,   8,   9])

The reason for this is that numpy arrays have to be semi-contiguous in memory (more precisely, they have to be able to be described by an offset, strides, and shape).

Any type of slicing can be described this way (even something like x[:, 3:100:5, None]).

An arbitrary sequence of coordinates (e.g. x[[1, 4, 5, 100]]) cannot be.

Therefore, numpy always returns a view if slicing is used and a copy if "fancy indexing" (a.k.a. using a sequence of indicies or a boolean mask) is used.

Assignments (e.g. x[blah] = y), however, will always modify a numpy array in-place.



回答2:

I agree; this is strange. Yet there is a logic to it.

Note that a sliced assignment is a special overloaded method in python. A sliced assignment doesn't create the view and then write to it; it writes to the array directly. You cant create a view to an ndarray of a[[2,0,1]], because you cant express this view as a strided array, which is the fundamental interface all numpy functions demand. But you can directly consume the indices and act on them. Arguably, for consistency, such a sliced assignment should make a modification to a copy; but where is the point in that, if you don't bind the newly created array to a new name?

It is somewhat awkward in python in general that assignment and sliced assignments are completely different beasts, which do completely different things. That is also what is at the root of this. sliced assignment and slicing on the right hand side call different functions, and are conceptually somewhat distinct. may_share_memory refers to the behavior of right hand side slicing, not sliced assignments.