fast way of computing diagonals of XMX^T in python

2020-04-08 14:27发布

问题:

I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:

X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
  out[n] = np.dot(np.dot(X[n, :], M), X[n, :])

I know somehow I should be using numpy.einsum, but I have not been able to figure out how?

Many thanks!

回答1:

Sure there is an np.einsum way, like so -

np.einsum('ij,ij->i',X.dot(M),X)

This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.

Runtime test -

This section compares all the approaches posted thus far to solve the problem.

In [132]: # Setup input arrays
     ...: X = np.random.randn(10000, 100)
     ...: M = np.random.rand(100, 100)
     ...: 
     ...: def original_app(X,M):
     ...:     out = np.zeros(10000)
     ...:     for n in range(10000):
     ...:       out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
     ...:     return out
     ...: 

In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True

In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop

In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# @Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop

In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # @hpaulj's solution
1 loops, best of 3: 442 ms per loop

In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop


回答2:

Here is a simpler example:

M = array([[ 0,  4,  8],
           [ 1,  5,  9],
           [ 2,  6, 10],
           [ 3,  7, 11]])

X = array([[ 0,  4,  8],
           [ 1,  5,  9],
           [ 2,  6, 10],
           [ 3,  7, 11]])

What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:

In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692


回答3:

In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))

In [212]: out=np.zeros(4)
In [213]: for n in range(4):
    out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
   .....:     

In [214]: out
Out[214]: array([   9.,  144.,  441.,  900.])

One einsum approach:

In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([   9.,  144.,  441.,  900.])

Comparing the other einsum:

In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop

In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop

This is a bit faster, but results may diff with your larger size.

einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).

Similar use of einsum - Combine Einsum Expresions