In numpy, what does indexing an array with the emp

2020-06-01 08:35发布

问题:

I just discovered — by chance — that an array in numpy may be indexed by an empty tuple:

In [62]: a = arange(5)

In [63]: a[()]
Out[63]: array([0, 1, 2, 3, 4])

I found some documentation on the numpy wiki ZeroRankArray:

(Sasha) First, whatever choice is made for x[...] and x[()] they should be the same because ... is just syntactic sugar for "as many : as necessary", which in the case of zero rank leads to ... = (:,)*0 = (). Second, rank zero arrays and numpy scalar types are interchangeable within numpy, but numpy scalars can be use in some python constructs where ndarrays can't.

So, for 0-d arrays a[()] and a[...] are supposed to be equivalent. Are they for higher-dimensional arrays, too? They strongly appear to be:

In [65]: a = arange(25).reshape(5, 5)

In [66]: a[()] is a[...]
Out[66]: False

In [67]: (a[()] == a[...]).all()
Out[67]: True

In [68]: a = arange(3**7).reshape((3,)*7)

In [69]: (a[()] == a[...]).all()
Out[69]: True

But, it is not syntactic sugar. Not for a high-dimensional array, and not even for a 0-d array:

In [76]: a[()] is a
Out[76]: False

In [77]: a[...] is a
Out[77]: True

In [79]: b = array(0)

In [80]: b[()] is b
Out[80]: False

In [81]: b[...] is b
Out[81]: True

And then there is the case of indexing by an empty list, which does something else altogether, but appears equivalent to indexing with an empty ndarray:

In [78]: a[[]]
Out[78]: array([], shape=(0, 3, 3, 3, 3, 3, 3), dtype=int64)

In [86]: a[arange(0)]
Out[86]: array([], shape=(0, 3, 3, 3, 3, 3, 3), dtype=int64)

In [82]: b[[]]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)

IndexError: 0-d arrays can't be indexed.

So, it appears that () and ... are similar but not quite identical and indexing with [] means something else altogether. And a[] or b[] are SyntaxErrors. Indexing with lists is documented at index arrays, and there is a short notice about indexing with tuples at the end of the same document.

That leaves the question:

Is the difference between a[()] and a[...] by design? What is the design, then?

(Question somehow reminiscent of: What does the empty `()` do on a Matlab matrix?)

Edit:

In fact, even scalars may be indexed by an empty tuple:

In [36]: numpy.int64(10)[()]
Out[36]: 10

回答1:

The treatment of A[...] is a special case, optimised to always return A itself:

if (op == Py_Ellipsis) {
    Py_INCREF(self);
    return (PyObject *)self;
}

Anything else that should be equivalent e.g. A[:], A[(Ellipsis,)], A[()], A[(slice(None),) * A.ndim] will instead return a view of the entirety of A, whose base is A:

>>> A[()] is A
False
>>> A[()].base is A
True

This seems an unnecessary and premature optimisation, as A[(Ellipsis,)] and A[()] will always give the same result (an entire view on A). From looking at https://github.com/numpy/numpy/commit/fa547b80f7035da85f66f9cbabc4ff75969d23cd it seems that it was originally required because indexing with ... didn't work properly on 0d arrays (previously to https://github.com/numpy/numpy/commit/4156b241aa3670f923428d4e72577a9962cdf042 it would return the element as a scalar), then extended to all arrays for consistency; since then, indexing has been fixed on 0d arrays so the optimisation isn't required, but it's managed to stick around vestigially (and there's probably some code that depends on A[...] is A being true).



回答2:

While in the example you've given, the empty tuple and ellipsis give a similar result, in general they serve different purposes. When indexing an array, A[i, j, k] == A[(i, j, k)] and specifically A[...] == A[(Ellipsis,)]. Here the tuple simply serves as a container for indexing elements. This can be useful when you need to manipulate the index as a variable, for example you can do:

index = (0,) * A.ndim
A[index]

Notice that because the tuple is the container for indexing elements, it cannot be combined with other indices, for example A[(), 0] == A[[], 0] and A[(), 0] != A[..., 0].

Because an array A can be indexed with fewer indices than A.ndim, indexing with an empty tuple is a natural extension of that behavior and it can be useful in some situations, for example the above code snipit will work when A.ndim == 0.

In short, the tuple serves as a container for indexing elements, which is allowed to be empty, while the Ellipsis is one of the possible indexing elements.



回答3:

According to the official Numpy documentation, the differences is clear:

An empty (tuple) index is a full scalar index into a zero dimensional array. x[()] returns a scalar if x is zero dimensional and a view otherwise. On the other hand x[...] always returns a view.

When an ellipsis (...) is present but has no size (i.e. replaces zero :) the result will still always be an array. A view if no advanced index is present, otherwise a copy.

>>> import numpy as np
>>> # ---------------------------------- #
>>> # when `x` is at least 1 dimensional #
>>> # ---------------------------------- #
>>> x = np.linspace(0, 10, 100)
>>> x.shape
(100,)
>>> x.ndim
1
>>> a = x[()]
>>> b = x[...]
>>> id(x), id(a), id(b)
(4559933568, 4561560080, 4585410192)
>>> id(x.base), id(a.base), id(b.base)
(4560914432, 4560914432, 4560914432)
>>> # ---------------------------- #
>>> # when `z` is zero dimensional #
>>> # ---------------------------- #
>>> z = np.array(3.14)
>>> z.shape
()
>>> z.ndim
0
>>> a = z[()]
>>> b = z[...]
>>> type(a), type(b)
(<class 'numpy.float64'>, <class 'numpy.ndarray'>)
>>> id(z), id(a), id(b)
(4585422896, 4586829384, 4561560080)
>>> id(z.base), id(a.base), id(b.base)
(4557260904, 4557260904, 4585422896)
>>> b.base is z
True