low RAM consuming c++ eigen solver

2019-04-10 21:50发布

问题:

I'm newbie in C++ programming, but I have a task to compute eigenvalues and eigenvectors (standard eigenproblem Ax=lx) for symmetric matrices (and hermitian)) for very large matrix of size: binomial(L,L/2) where L is about 18-22. Now I'm testing it on machine which has about 7.7 GB ram available, but at final I'll have access to PC with 64GB RAM.

I've started with Lapack++. At the beginning my project assume to solve this problem only for symmetrical real matrices.

This library was great. Very quick and small RAM consuming. It has an option to compute eigenvectors and place into input matrix A to save memory. It works! I thought that Lapack++ eigensolver can handle with Hermitian matrix, but it can't for unknown reason (maybe I'm doing something wrong). My project has evolved and I should be able to calculate also this problem for Hermitian matrices.

So I've tried to change library to Armadillo library. It works fine, but it isn't that nice as Lapack++ which replaces mat A with all eigenvec, but of course supports hermitian matrices.

Some statistic for L=14

  • Lapack++ RAM 126MB time 7.9s eigenval + eigenvectors

  • Armadillo RAM 216MB time 12s eigenval

  • Armadillo RAM 396MB time 15s eigenval+eigenvectors

Let's do some calculation: double variable is about 8B. My matrix has size binomial(14,7) = 3432, so in the ideal case it should have 3432^2*8/1024^2 = 89 MB.

My question is: is it posible to modify or inforce Armadillo to do a nice trick as Lapack++? Armadillo uses LAPACK and BLAS routines. Or maybe someone can recomemnd another approach to this problem using another library?

P.S.: My matrix is really sparse. It has about 2 * binomial(L,L/2) non-zero elements. I've tried to compute this with SuperLU in CSC format but It wasn't very effective, for L=14 -> RAM 185MB, but time 135s.

回答1:

Both Lapackpp and Armadillo rely on Lapack to compute the eigenvalues and eigenvectors of complex matrices. The Lapack library provides different way to perform these operations for complex hermitian matrices.

  • The function zgeev() does not care about the matrix being Hermitian. This function is called by the Lapackpp library for matrices of type LaGenMatComplex in the function LaEigSolve. The function eig_gen() of the Armadillo library calls this function.

  • The function zheev() is dedicated to complex Hermitian matrices. It first call ZHETRD to reduce Hermitian matrix to tridiagonal form. Depending on whether the eigenvectors are needed, it then uses a QR algorithm to compute the eigenvalues and eigenvectors (if needed). The function eig_sym() of the Armadillo library call this function if the method std is selected.

  • The function zheevd() does the same thing as zheev() if the eigenvectors are not required. Otherwise, it makes use of a divide and conquert algorithm (see zstedc()). The function eig_sym() of the Armadillo library call this function if the method dc is selected. Since the divide and conquer proved to be faster for large matrices, it is now the default method.

Functions with more options are available in the Lapack library. (see zheevr() or zheevx). If you want to keep a dense matrix format, you can also try the ComplexEigenSolver of the Eigen library.

Here is a little C++ test using the C wrapper LAPACKE of the Lapack library. It is compiled by g++ main.cpp -o main2 -L /home/...../lapack-3.5.0 -llapacke -llapack -lblas

#include <iostream>

#include <complex>
#include <ctime>
#include <cstring>

#include "lapacke.h"

#undef complex
using namespace std;

int main()
{
    //int n = 3432;

    int n = 600;

    std::complex<double> *matrix=new std::complex<double>[n*n];
    memset(matrix, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix2=new std::complex<double>[n*n];
    memset(matrix2, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix3=new std::complex<double>[n*n];
    memset(matrix3, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix4=new std::complex<double>[n*n];
    memset(matrix4, 0, n*n*sizeof(std::complex<double>));
    for(int i=0;i<n;i++){
        matrix[i*n+i]=42;
        matrix2[i*n+i]=42;
        matrix3[i*n+i]=42;
        matrix4[i*n+i]=42;
    }

    for(int i=0;i<n-1;i++){
        matrix[i*n+(i+1)]=20;
        matrix2[i*n+(i+1)]=20;
        matrix3[i*n+(i+1)]=20;
        matrix4[i*n+(i+1)]=20;

        matrix[(i+1)*n+i]=20;
        matrix2[(i+1)*n+i]=20;
        matrix3[(i+1)*n+i]=20;
        matrix4[(i+1)*n+i]=20;
    }

    double* w=new double[n];//eigenvalues

    //the lapack function zheev
    clock_t t;
    t = clock();
    LAPACKE_zheev(LAPACK_COL_MAJOR,'V','U', n,reinterpret_cast< __complex__ double*>(matrix), n, w);
    t = clock() - t;
    cout<<"zheev : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    std::complex<double> *wc=new std::complex<double>[n];
    std::complex<double> *vl=new std::complex<double>[n*n];
    std::complex<double> *vr=new std::complex<double>[n*n];

    t = clock();
    LAPACKE_zgeev(LAPACK_COL_MAJOR,'V','V', n,reinterpret_cast< __complex__ double*>(matrix2), n, reinterpret_cast< __complex__ double*>(wc),reinterpret_cast< __complex__ double*>(vl),n,reinterpret_cast< __complex__ double*>(vr),n);
    t = clock() - t;
    cout<<"zgeev : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<wc[0]<<endl;

    t = clock();
    LAPACKE_zheevd(LAPACK_COL_MAJOR,'V','U', n,reinterpret_cast< __complex__ double*>(matrix3), n, w);
    t = clock() - t;
    cout<<"zheevd : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    t = clock();
    LAPACKE_zheevd(LAPACK_COL_MAJOR,'N','U', n,reinterpret_cast< __complex__ double*>(matrix4), n, w);
    t = clock() - t;
    cout<<"zheevd (no vector) : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    delete[] w;
    delete[] wc;
    delete[] vl;
    delete[] vr;
    delete[] matrix;
    delete[] matrix2;
    return 0;
}

The output I have of my computer is :

zheev : 2.79 seconds
largest eigenvalue=81.9995
zgeev : 10.74 seconds
largest eigenvalue=(77.8421,0)
zheevd : 0.44 seconds
largest eigenvalue=81.9995
zheevd (no vector) : 0.02 seconds
largest eigenvalue=81.9995

These tests could have been performed by using the Armadillo library. Calling directly the Lapack library might allow you to gain some memory, but wrappers of Lapack can be efficient on this aspect as well.

The real question is whether you need all the eigenvectors, all the eigenvalues or only the largest eigenvalues. Because there are really efficient methods in the last case. Take a look at the Arnoldi/Lanczos itterative algorithms. Huge memory gains are possible if the matrix is sparce since only matrix-vector products are performed : there is no need to keep the dense format. This is what is done in the SlepC library, which makes use of the sparce matrix formats of Petsc. Here is an example of Slepc which can be used as a starting point.



回答2:

If someone has same problem as I in the future there are some tips after I found the solution (thank for all that posted some answers or clues!).

On the intel site you can found some nice examples which are written in Fortran and C. For example hermitian eigenvalue problem routine zheev(): https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/zheev_ex.c.htm

To make it works in C++ you should edit some lines in that code:

In prototype functions declaration do similar for all functions: extern void zheev( ... ) change to extern "C" {void zheev( ... )}

Change calling lapack's function adding _ symbol for example: zheev( ... ) to zheev_( ... ) (make it for all in code simply by substitution, but I don't why it works. I figured it out by making some experiments.)

Optionally you can convert printf function to a standard stream std::cout and replace included headers stdio.h to iostream.

To compile run command like: g++ test.cpp -o test -llapack -lgfortran -lm -Wno-write-strings

About the last option -Wno-write-strings, I don't now what it's doing, but probably there're problems with intel's example when they put string not char in calling function zheev( "Vectors", "Lower", ... )