I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False
.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True
.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so
x[0]
is a 'row', a slice of the original.What I wrote earlier about slices still applies. Indexing an individual elements, as with
arr[0,0]
works the same as with a 1d array.This 2d arr has the same databuffer as the 1d
arr.ravel()
; the shape and strides are different. And the distinction betweenview
,copy
anditem
still applies.A common way of implementing 2d arrays in C is to have an array of pointers to other arrays.
numpy
takes a different,strided
approach, with just one flat array of data, and usesshape
andstrides
parameters to implement the transversal. So a subarray requires its ownshape
andstrides
as well as a pointer to the shared databuffer.1d array indexing
I'll try to illustrate what is going on when you index an array:
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
or
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
Also if I change the array at this point, it does not affect the earlier selections:
x1
andx2
don't have the sameid
, and thus won't match withis
, and they don't use thearr
data buffer either. There's no record that either variable was derived fromarr
.With
slicing
it is possible get aview
of the original array,It's data pointer is 4 bytes larger than
arr
- that is, it points to the same buffer, just a different spot. And changingy
does changearr
(but not the independentx1
).I could even make a 0d view of this item
In Python code we normally don't work with objects like this. When we use the
c-api
orcython
is it possible to access the data buffer directly.nditer
is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). Incython
typed memoryviews
are particularly useful for low level access.http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
==
is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.If such a comparison needs to be used in a scalar context (such as an
if
) it needs to be reduced to a single value, as withnp.all
ornp.any
.The
is
test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions likeis None
, whereNone
is an object with a unique id, and which does not play nicely with equality tests.I think that you have a miss understanding about Numpy arrays. You think that sub arrays in a multidimensional array in Numpy (like in Python lists) are separate objects, well, they're not.
A Numpy array, regardless of its dimension is just one object. And that's because Numpy creates the arrays at C levels and when loads them up as a python object it can't be break down to multiple objects. That makes Python to create a new object for preserving new parts when you use some attributes like
split()
,__getitem__
,take()
or etc., which as a mater of fact, its just the way that python abstracts the list-like behavior for Numpy arrays.You can also check thin in real-time like following:
So as soon as you have an array or any mutable object that can hols other object in it you'll have a python mutable object and therefore you will lose the performance and all other Numpy array's cool features.
Also as @Imanol mentioned in comments you may want to use Numpy view objects if you want to have a memory optimized and flexible operation when you want to modify an array(s) with reference(s).
view
objects can be constructed in following two ways: