I want to multiply an n-dim stack of m* m matrices by an n-dim stack of vectors (length m), so that the resulting m*n array contains the result of the dot product of the matrix and vector in the nth entry:
vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
vec=np.transpose(n.stack((vec1,vec2)))
mat = np.moveaxis(n.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
outvec=np.zeros((4,2))
for i in range(2):
outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])
Inspired by this post Element wise dot product of matrices and vectors, I have tried all different perturbations of index combinations in einsum, and have found that
np.einsum('ijk,jk->ik',mat,vec)
gives the correct result.
Unfortunately I really do not understand this - I assumed the fact that I repeat the entry k in the 'ijk,jk' part means that I multiply AND sum over k. I've tried to read the documentation https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.einsum.html, but I still don't understand.
(My previous attempts included,
np.einsum('ijk,il->ik', mat, vec)
I'm not even sure what this means. What happens to the index l when I drop it?)
Thanks in advance!
In [321]: vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
...: vec=np.transpose(np.stack((vec1,vec2)))
In [322]: vec1.shape
Out[322]: (4,)
In [323]: vec.shape
Out[323]: (4, 2)
A nice thing about the stack
function is we can specify an axis, skipping the transpose:
In [324]: np.stack((vec1,vec2), axis=1).shape
Out[324]: (4, 2)
Why the mix of np.
and n.
? NameError: name 'n' is not defined
. That kind of thing almost sends me away.
In [326]: mat = np.moveaxis(np.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0
...: ,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
In [327]: mat.shape
Out[327]: (4, 4, 2)
In [328]: outvec=np.zeros((4,2))
...: for i in range(2):
...: outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])
...:
In [329]: outvec
Out[329]:
array([[ 4. , -0.5 ],
[ 4. , 0. ],
[ 4. , 0.5 ],
[ 4. , 3.55]])
In [330]: # (4,4,2) (4,2) 'kji,ji->ki'
From your loop, the location of the i
axis (size 2) is clear - last in all 3 arrays. That leaves one axis for vec
, lets call that j
. It pairs with the last (next to i
of mat
). k
carries over from mat
to outvec
.
In [331]: np.einsum('kji,ji->ki', mat, vec)
Out[331]:
array([[ 4. , -0.5 ],
[ 4. , 0. ],
[ 4. , 0.5 ],
[ 4. , 3.55]])
Often the einsum
string writes itself. For example if mat
was described as (m,n,k) and vec
as (n,k), with the result being (m,k)
In this case only the j
dimension is summed - it appears on the left, but on the right. The last dimension, i
in my notation, is not summed because if appears on both sides, just as it does in your iteration. I think of that as 'going-along-for-the-ride'. It isn't actively part of the dot
product.
You are, in effect, stacking on the last dimension, size 2 one. Usually we stack on the first, but you transpose both to put that last.
Your 'failed' attempt runs, and can be reproduced as:
In [332]: np.einsum('ijk,il->ik', mat, vec)
Out[332]:
array([[12. , 4. ],
[ 6. , 1. ],
[12. , 4. ],
[ 6. , 3.1]])
In [333]: mat.sum(axis=1)*vec.sum(axis=1)[:,None]
Out[333]:
array([[12. , 4. ],
[ 6. , 1. ],
[12. , 4. ],
[ 6. , 3.1]])
The j
and l
dimensions don't appear on the right, so they are summed. They can be summed before multiplying because they appear in only one term each. I added the None
to enable broadcasting (multiplying a ik
with i
).
np.einsum('ik,i->ik', mat.sum(axis=1), vec.sum(axis=1))
If you'd stacked on the first, and added a dimension for vec
(2,4,1), it would matmul
with a (2,4,4) mat. mat @ vec[...,None]
.
In [337]: m1 = mat.transpose(2,0,1)
In [338]: m1@v1[...,None]
Out[338]:
array([[[ 4. ],
[ 4. ],
[ 4. ],
[ 4. ]],
[[-0.5 ],
[ 0. ],
[ 0.5 ],
[ 3.55]]])
In [339]: _.shape
Out[339]: (2, 4, 1)
Read up on Einstein summation notation.
Basically, the rules are:
Without a ->
- Any letter repeated in the inputs represents an axis to be multipled and summed over
- Any letter not repeated in the inputs is included in the output
With a ->
- Any letter repeated in the inputs represents an axis to be multipled over
- Any letter not in the output represents an axis to be summed over
So, for example, with matrices A
and B
wih same shape:
np.einsum('ij, ij', A, B) # is A ddot B, returns 0d scalar
np.einsum('ij, jk', A, B) # is A dot B, returns 2d tensor
np.einsum('ij, kl', A, B) # is outer(A, B), returns 4d tensor
np.einsum('ji, jk, kl', A, B) # is A.T @ B @ A, returns 2d tensor
np.einsum('ij, ij -> ij', A, B) # is A * B, returns 2d tensor
np.einsum('ij, ij -> i' , A, A) # is norm(A, axis = 1), returns 1d tensor
np.einsum('ii' , A) # is tr(A), returns 0d scalar
einsum
is easy (when you had played with permutation of indices for a while, that is...).
Let's work with something simple, a triple stack of 2×2 matrices and a triple stack of 2×, arrays
import numpy as np
a = np.arange(3*2*2).reshape((3,2,2))
b = np.arange(3*2).reshape((3,2))
We need to know what we are going to compute using einsum
In [101]: for i in range(3):
...: print(a[i]@b[i])
[1 3]
[23 33]
[77 95]
What we have done? we have an index i
that is fixed when we perform a dot product between one of the stacked matrices and one of the stacked vectors (both indexed by i
) and the individual output line implies a summation over the last index of the stacked matrix and the lone index of the stacked vector.
This is easily encoded in an einsum
directive
- we want the same
i
index to specify the matrix, the vector and also the output,
- we want to reduce along the last matrix index and the remaining vector index, say
k
- we want to have as many columns in the output as the rows in each stacked matrix, say
j
Hence
In [102]: np.einsum('ijk,ik->ij', a, b)
Out[102]:
array([[ 1, 3],
[23, 33],
[77, 95]])
I hope that my discussion of how I got the directive right is clear, correct and useful.