numpy - einsum notation: dot product of a stack of

2019-08-20 11:46发布

问题:

I want to multiply an n-dim stack of m* m matrices by an n-dim stack of vectors (length m), so that the resulting m*n array contains the result of the dot product of the matrix and vector in the nth entry:

vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
vec=np.transpose(n.stack((vec1,vec2)))
mat = np.moveaxis(n.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
outvec=np.zeros((4,2))
for i in range(2):
    outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])

Inspired by this post Element wise dot product of matrices and vectors, I have tried all different perturbations of index combinations in einsum, and have found that

np.einsum('ijk,jk->ik',mat,vec)

gives the correct result.

Unfortunately I really do not understand this - I assumed the fact that I repeat the entry k in the 'ijk,jk' part means that I multiply AND sum over k. I've tried to read the documentation https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.einsum.html, but I still don't understand.

(My previous attempts included,

 np.einsum('ijk,il->ik', mat, vec)

I'm not even sure what this means. What happens to the index l when I drop it?)

Thanks in advance!

回答1:

In [321]: vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
     ...: vec=np.transpose(np.stack((vec1,vec2)))
In [322]: vec1.shape
Out[322]: (4,)
In [323]: vec.shape
Out[323]: (4, 2)

A nice thing about the stack function is we can specify an axis, skipping the transpose:

In [324]: np.stack((vec1,vec2), axis=1).shape
Out[324]: (4, 2)

Why the mix of np. and n.? NameError: name 'n' is not defined. That kind of thing almost sends me away.

In [326]: mat = np.moveaxis(np.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0
     ...: ,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
In [327]: mat.shape
Out[327]: (4, 4, 2)

In [328]: outvec=np.zeros((4,2))
     ...: for i in range(2):
     ...:     outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])
     ...:     
In [329]: outvec
Out[329]: 
array([[ 4.  , -0.5 ],
       [ 4.  ,  0.  ],
       [ 4.  ,  0.5 ],
       [ 4.  ,  3.55]])

In [330]: # (4,4,2) (4,2)   'kji,ji->ki'

From your loop, the location of the i axis (size 2) is clear - last in all 3 arrays. That leaves one axis for vec, lets call that j. It pairs with the last (next to i of mat). k carries over from mat to outvec.

In [331]: np.einsum('kji,ji->ki', mat, vec)
Out[331]: 
array([[ 4.  , -0.5 ],
       [ 4.  ,  0.  ],
       [ 4.  ,  0.5 ],
       [ 4.  ,  3.55]])

Often the einsum string writes itself. For example if mat was described as (m,n,k) and vec as (n,k), with the result being (m,k)

In this case only the j dimension is summed - it appears on the left, but on the right. The last dimension, i in my notation, is not summed because if appears on both sides, just as it does in your iteration. I think of that as 'going-along-for-the-ride'. It isn't actively part of the dot product.

You are, in effect, stacking on the last dimension, size 2 one. Usually we stack on the first, but you transpose both to put that last.


Your 'failed' attempt runs, and can be reproduced as:

In [332]: np.einsum('ijk,il->ik', mat, vec)
Out[332]: 
array([[12. ,  4. ],
       [ 6. ,  1. ],
       [12. ,  4. ],
       [ 6. ,  3.1]])
In [333]: mat.sum(axis=1)*vec.sum(axis=1)[:,None]
Out[333]: 
array([[12. ,  4. ],
       [ 6. ,  1. ],
       [12. ,  4. ],
       [ 6. ,  3.1]])

The j and l dimensions don't appear on the right, so they are summed. They can be summed before multiplying because they appear in only one term each. I added the None to enable broadcasting (multiplying a ik with i).

np.einsum('ik,i->ik', mat.sum(axis=1), vec.sum(axis=1))

If you'd stacked on the first, and added a dimension for vec (2,4,1), it would matmul with a (2,4,4) mat. mat @ vec[...,None].

In [337]: m1 = mat.transpose(2,0,1)
In [338]: m1@v1[...,None]
Out[338]: 
array([[[ 4.  ],
        [ 4.  ],
        [ 4.  ],
        [ 4.  ]],

       [[-0.5 ],
        [ 0.  ],
        [ 0.5 ],
        [ 3.55]]])
In [339]: _.shape
Out[339]: (2, 4, 1)


回答2:

Read up on Einstein summation notation.

Basically, the rules are:

Without a ->

  • Any letter repeated in the inputs represents an axis to be multipled and summed over
  • Any letter not repeated in the inputs is included in the output

With a ->

  • Any letter repeated in the inputs represents an axis to be multipled over
  • Any letter not in the output represents an axis to be summed over

So, for example, with matrices A and B wih same shape:

np.einsum('ij, ij',       A, B)  # is A ddot B,                returns 0d scalar
np.einsum('ij, jk',       A, B)  # is A dot  B,                returns 2d tensor
np.einsum('ij, kl',       A, B)  # is outer(A, B),             returns 4d tensor
np.einsum('ji, jk, kl',   A, B)  # is A.T @ B @ A,             returns 2d tensor
np.einsum('ij, ij -> ij', A, B)  # is A * B,                   returns 2d tensor
np.einsum('ij, ij -> i' , A, A)  # is norm(A, axis = 1),       returns 1d tensor
np.einsum('ii'             , A)  # is tr(A),                   returns 0d scalar


回答3:

einsum is easy (when you had played with permutation of indices for a while, that is...).

Let's work with something simple, a triple stack of 2×2 matrices and a triple stack of 2×, arrays

import numpy as np

a = np.arange(3*2*2).reshape((3,2,2))
b = np.arange(3*2).reshape((3,2))

We need to know what we are going to compute using einsum

In [101]: for i in range(3): 
     ...:     print(a[i]@b[i])                                                                            
[1 3]
[23 33]
[77 95]

What we have done? we have an index i that is fixed when we perform a dot product between one of the stacked matrices and one of the stacked vectors (both indexed by i) and the individual output line implies a summation over the last index of the stacked matrix and the lone index of the stacked vector.

This is easily encoded in an einsum directive

  • we want the same i index to specify the matrix, the vector and also the output,
  • we want to reduce along the last matrix index and the remaining vector index, say k
  • we want to have as many columns in the output as the rows in each stacked matrix, say j

Hence

In [102]: np.einsum('ijk,ik->ij', a, b)                                                                   
Out[102]: 
array([[ 1,  3],
       [23, 33],
       [77, 95]])

I hope that my discussion of how I got the directive right is clear, correct and useful.