I am considering implementing an array like container, and I'm not sure whether to use a gsl::gsl_vector or a std::vector. The container needs to be space efficient but also very fast at calling values. The container will be referenced constantly in the main program to, for example, input values into tensor functions, amongst other things. I am calling from the container literally billions of times.
Here are the pros and cons that I've considered so far:
The gsl_vector is convenient because it will allow me to use the gsl BLAS libraries on occasion, and the gsl_vector_get(...)
call is very efficient. On the other hand, I am able to get almost the same speed of calls using stl iterators, and the stl vector has an interface that I find quite natural.
Are there any memory overheads/efficiency issues I should be aware of that I've overlook in the above code?
Also, I am using a std::vector<gsl_vector*>
implementation at the moment, and an iterator to traverse the std::vector. Would it be smarter to use a gsl_matrix here? The idea would be to use gsl_vector_views to get the right vector, rather than the iterator. Would this be more efficient?
On one hand, it is true that with gsl_vector you can use gsl BLAS which is a big advantage. On the other hand, it is also true that gsl interface is quite cumbersome for a c++ programmer. So, neither solutions are truly satisfactory. However, I strongly prefer the use of gsl_matrix because
(i) with some effort you can write a small wrapper class that ameliorate the cumbersome C interface of gsl_matrix (it is much harder to deal with the lack of BLAS library in std::vector).
(ii) gsl_matrix is just a wrapper to an one dimensional continuous array where
m(i,j) = array[i*N + j]
for square matrix (even if the matrix is not square gsl_matrix still implements it as one dimensional array). Instd::vector<gsl_vector*>
, you will need to "malloc" each gsl_vector individually and this implies that memory won't be contiguous. This hits performance because the lack of "spatial locality" in memory allocation usually increases cache misses substantially.If you have the choice to use a completely different solution, I would implement the tensor calculation using StaticMatrix or DynamicMatrix classes in Blaze lib
Blaze
Why Blaze?
(i) The StaticMatrix or DynamicMatrix interface is much better than
std::vector<gsl_vector*>
or gsl_matrix(ii) Blaze is the fastest BLAS lib available in C++. It is faster than gsl if you have available Intel MKL (remember that Intel MKL is faster than gsl BLAS). Why so? Because Blaze uses a new technique called "Smart Expression Template" . Basically, researchers in Germany showed in a series of articles paper 1 paper 2 that the "Expression Template" technique, which is the standard technique in many C++ BLAS libraries, is terrible for matrix operations (BLAS 3 operations) because compiler can't be smarter than low level code. However, "Expression Template" can be used as a smarter wrapper to low level BLAS libraries like intel MKL. So they create "Smart Expression Template" technique which is just a wrapper to your choice of low level blas lib. Their benchmarks are astonishing
benchmark