I have to calculate some products in the form A'A
or more general A'DA
, where A
is a general mxn
matrix and D
is a diagonal mxm
matrix. Both of them are full rank; i.e.rank(A)=min(m,n)
.
I know that you can save a substantial time is such symmetric products: given that A'A
is symmetric, you only have to calculate the lower --or upper-- diagonal part of the product matrix. That adds to n(n+1)/2
entries to be calculated, which is roughly the half of the typical n^2
for large matrices.
This is a great saving that I want to exploit, and I know I can implement the matrix-matrix multiply within a for
loop . However, so far I have been using BLAS, which is much faster than any for
loop implementation that I could write by myself, since it optimizes cache and memory management.
Is there a way to efficiently compute A'A
or even A'DA
using BLAS?
Thanks!
SYRK is fine for A'A. For A'DA you can use SYMM for one side of it eg V = A'D and then use intel MKL's GEMMT for W = V A. GEMMT is like GEMM except it takes advantage of the fact that the result matrix is symmetric, and thus only has to do roughly half the work.
You are look for
dsyrk
subroutine of BLAS.As described in the documentation:
In the case of
A'A
storing upper triangular is:For the
A'DA
there is no direct equivalent in BLAS. However you can usedsyr
in a for loop.