I don't understand how BLAS, LAPACK and ATLAS are related and how I should use them together! I have been looking through all of their manuals and I have a general idea of BLAS and LAPACK and how to use them with the very few examples I find, but I can't find any actual examples using ATLAS to see how it is related with these two.
I am trying to do some low level work on matrixes and my primary language is C. First I wanted to use GSL, but it says that if you want the best performance you should use BLAS and ATLAS. Is there any good webpage giving some nice examples of how to use these (in C) all together? In other words I am looking for a tutorial on using these three (or any subset of them!). In short I am confused!
BLAS is a collection of low-level matrix and vector arithmetic operations (“multiply a vector by a scalar”, “multiply two matrices and add to a third matrix”, etc ...).
LAPACK is a collection of higher-level linear algebra operations. Things like matrix factorizations (LU, LLt, QR, SVD, Schur, etc) that are used to do things like “find the eigenvalues of a matrix”, or “find the singular values of a matrix”, or “solve a linear system”. LAPACK is built on top of the BLAS; many users of LAPACK only use the LAPACK interfaces and never need to be aware of the BLAS at all. LAPACK is generally compiled separately from the BLAS, and can use whatever highly-optimized BLAS implementation you have available.
ATLAS is a portable reasonably good implementation of the BLAS interfaces, that also implements a few of the most commonly used LAPACK operations.
What “you should use” depends somewhat on details of what you’re trying to do and what platform you’re using. You won't go too far wrong with “use ATLAS + LAPACK”, however.
While ago, when I started doing some linear algebra in C
, it came to me as a surprise to see there are so few tutorials for BLAS
, LAPACK
and other fundamental API
s, despite the fact that they are somehow the cornerstones of many other libraries. For that reason I started collecting all the examples/tutorials I could find all over the internet for BLAS
, CBLAS
, LAPACK
, CLAPACK
, LAPACKE
, ATLAS
, OpenBLAS
... in this Github repo.
Well, I should warn you that as a mechanical engineer I have little experience in managing such a git repository or GitHub. It will first seem as a complete mess to you guys. However if you manage to get over the messy structure you will find all kind of examples and instructions which might be a help. I have tried most of them, to be sure they compile. And the ones which do not compile I have mentioned. I have modified many of them to be compilable with GNU compilers
(gcc
, g++
and gfortran
). I have made MakeFile
s which you can read to learn how you can call individual Fortran/FORTRAN
routines in a C
or C++
program. I have also put some installations instructions for mac and linux (sorry windows guys!). I have also made some bash
.sh
files for automatic compilation of some of these libraries.
But going to your other question: BLAS
and LAPACK
are rather API
s not specific SDK
s. They are just a list of specifications or language extensions rather than an implementations or libraries. With that said, there are original implementations by Netlib in FORTRAN 77
, which most people refer to (confusingly!) when talking about BLAS
and LAPACK
. So if you see a lot of strange things when using these API
s is because you were actually calling FORTRAN
routines in C
rather than C
libraries and functions. ATLAS
and OpenBLAS
are some of the best implementations of BLAS
and LACPACK
as far as I know. They conform to the original API
, even though, to my knowledge they are implemented on C/C++
from scratch (not sure!). There are GPGPU implementations of the API
s using OpenCL
: CLBlast, clBLAS, clMAGMA, ArrayFire and ViennaCL to mention some. There are also vendor specific implementations optimized for specific hardware or platform, which I strongly discourage anybody to use them.
My recommendation to anyone who wants to learn using BLAS
and LAPACK
in C
is to learn FORTRAN-C
mixed programming first. The first chapter of the mentioned repo is dedicated to this matter and there I have collected many different examples.
P.S. I have been working on the dev branch of the repository time to time. It seems slightly less messy!
ATLAS is by now quite outdated. It was developed at a time when it was thought that optimizing the BLAS for various platforms was beyond the ability of humans, and as a result autogeneration and autotuning was the way to go.
In the early 2000s, along came Kazushige Goto, who showed how highly efficient implementations can be coded by hand. You may enjoy an interesting article in the New York Times: https://www.nytimes.com/2005/11/28/technology/writing-the-fastest-code-by-hand-for-fun-a-human-computer-keeps.html.
Kazushige's on the one hand had better insights into the theory behind high-performance implementations of matrix-matrix multiplication, and on the other hand engineered these better. His approach, which on current CPUs is usually the highest performing, is not in the search space that ATLAS autotunes. Hence, ATLAS is inherently inferior. Kazushige's implementation of the BLAS became known as the GotoBLAS. It was forked as the OpenBLAS when he joined industry.
The ideas behind the GotoBLAS were refactored into a new implementation, the BLAS-like Library Instantiation Software (BLIS) framework (https://github.com/flame/blis), which implements the same algorithms, but structures the code so that less needs to be custom implemented for a new architecture. BLIS is coded in C.
What this discussion shows is that there are many implementation of the BLAS. The BLAS themselves are a de facto standard for the interface. ATLAS was once the state of the art. It is no longer.
As far as I'm aware, and after working through the ATLAS repository, it seems that it includes a re-implementation of BLAS in C. There's bit more to it than that but I hope it answers the question.