A few examples:
numpy.sum()
ndarray.sum()
numpy.amax()
ndarray.max()
numpy.dot()
ndarray.dot()
... and quite a few more. Is it to support some legacy code, or is there a better reason for that? And, do I choose only on the basis of how my code 'looks', or is one of the two ways better than the other?
I can imagine that one might want numpy.dot()
to use reduce
(e.g., reduce(numpy.dot, A, B, C, D)
) but I don't think that would be as useful for something like numpy.sum()
.
As others have noted, the identically-named NumPy functions and array methods are often equivalent (they end up calling the same underlying code). One might be preferred over the other if it makes for easier reading.
However, in some instances the two behave different slightly differently. In particular, using the ndarray
method sometimes emphasises the fact that the method is modifying the array in-place.
For example, np.resize
returns a new array with the specified shape. On the other hand, ndarray.resize
changes the shape of the array in-place. The fill values used in each case are also different.
Similarly, a.sort()
sorts the array a
in-place, while np.sort(a)
returns a sorted copy.
In most cases the method is the basic compiled version. The function uses that method when available, but also has some sort of backup when the argument(s) is not an array. It helps to look at the code and/or docs of the function or method.
For example if in Ipython
I ask to look at the code for the sum method, I see that it is compiled code
In [711]: x.sum??
Type: builtin_function_or_method
String form: <built-in method sum of numpy.ndarray object at 0xac1bce0>
...
Refer to `numpy.sum` for full documentation.
Do the same on np.sum
I get many lines of documentation plus some Python code:
if isinstance(a, _gentype):
res = _sum_(a)
if out is not None:
out[...] = res
return out
return res
elif type(a) is not mu.ndarray:
try:
sum = a.sum
except AttributeError:
return _methods._sum(a, axis=axis, dtype=dtype,
out=out, keepdims=keepdims)
# NOTE: Dropping the keepdims parameters here...
return sum(axis=axis, dtype=dtype, out=out)
else:
return _methods._sum(a, axis=axis, dtype=dtype,
out=out, keepdims=keepdims)
If I call np.sum(x)
where x
is an array, it ends up calling x.sum()
:
sum = a.sum
return sum(axis=axis, dtype=dtype, out=out)
np.amax
similar (but simpler). Note that the np.
form can handle a an object that isn't an array (that doesn't have the method), e.g. a list: np.amax([1,2,3])
.
np.dot
and x.dot
both show as 'built-in' function, so we can't say anything about priority. They probably both end up calling some underlying C function.
np.reshape
is another that deligates if possible:
try:
reshape = a.reshape
except AttributeError:
return _wrapit(a, 'reshape', newshape, order=order)
return reshape(newshape, order=order)
So np.reshape(x,(2,3))
is identical in functionality to x.reshape((2,3))
. But the _wrapit
expression enables np.reshape([1,2,3,4],(2,2))
.
np.sort
returns a copy by doing an inplace sort on a copy:
a = asanyarray(a).copy()
a.sort(axis, kind, order)
return a
x.resize
is built-in, while np.resize
ends up doing a np.concatenate
and reshape
.
If your array is a subclass, like matrix or masked, it may have its own variant. The action of a matrix .sum
is:
return N.ndarray.sum(self, axis, dtype, out, keepdims=True)._collapse(axis)
Elaborating on Peter's comment for visibility:
We could make it more consistent by removing methods from ndarray and sticking to just functions. But this is impossible because it would break everyone's existing code that uses methods.
Or, we could move all functions to also be methods. But this is impossible because new users and packages are constantly defining new functions. Plus continuing to multiply these duplicate methods violates "there should be one obvious way to do it".
If we could go back in time then I'd probably argue for not having these methods on ndarray at all, and using functions exclusively. ... So this all argues for using functions exclusively
numpy issue: More consistency with array-methods #7452