Say I have two very big matrices A
(M-by-N) and B
(N-by-M). I need the diagonal of A*B
. Computing the full A*B
requires M*M*N multiplications, while computing the diagonal of it only requires M*N multiplications since there's no need to compute the elements that will end up outside the diagonal.
Does MATLAB realize this and on-the-fly-optimize diag(A*B)
automagically, or am I better off using a for loop in this case?
One can also implement
diag(A*B)
assum(A.*B',2)
. Let's benchmark this along with all other implementations/solutions as suggested for this question.The different methods implemented as functions are listed below for benchmarking purposes:
Sum-multiplication method-1
Sum-multiplication method-2
For-loop method
Full/Direct-multiplication method
Bsxfun-method
Benchmarking Code
Results
Intermediate Conclusions
Looks like
sum-multiplication
methods are the best approaches, thoughbsxfun
approach seems be to catching up with them asM
increases from 100 to 1000.Next, higher benchmarking sizes were tested with just the
sum-multiplication
andbsxfun
methods. The sizes were -The results are -
Alternate benchmarking Code (with `timeit)
Plot
Final Conclusions
It seems
sum-multiplication
method is great till certain stage, which is aroundM=5000
mark and after thatbsxfun
seems to have a slight upper-hand.Future Work
One can look into varying
N
and study the performances for the implementations mentioned here.You can compute only the diagonal elements without a loop: just use
or
Actually, you can do this faster than a
for
loop using the wonders ofbsxfun
:This about twice as fast as the explicit
for
loop on my machine for large matrices (2,000-by-2,000 and bigger) and is faster for matrices larger than about 500-by-500.Note that all of these methods will produce numerically different results because of the different orders of summation and multiplication.
Yes, this is one of the rare cases where a for loop is better.
I ran the following script through the profiler:
I reset A and B between each test just in case MATLAB would try to speed anything up by caching.
Result (edited for brevity):
As we can see, the for loop variant is an order of magnitude faster than the other two in this case. While the
diag(A*B)
variant is actually faster than thediag(product)
variant, it's marginal at best.I tried some different values of M and N, and in my tests the for loop variant is slower only if M=1.