I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b # Python 3.5+
d = np.dot(a, b)
The @
operator returns an array of shape:
c.shape
(8, 13, 13)
while the np.dot()
function returns:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?
In mathematics, I think the dot in numpy makes more sense
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
To use dot(a,b) you need
To use matmul(a,b) you need
Use the following piece of code to convince yourself.
Code sample
The
@
operator calls the array's__matmul__
method, notdot
. This method is also present in the API as the functionnp.matmul
.From the documentation:
The last point makes it clear that
dot
andmatmul
methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:For
matmul
:For
np.dot
:The answer by @ajcr explains how the
dot
andmatmul
(invoked by the@
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.To clarify the differences take a 4x4 array and return the
dot
product andmatmul
product with a 2x4x3 'stack of matricies' or tensor.The products of each operation appear below. Notice how the dot product is,
and how the matrix product is formed by broadcasting the matrix together.