can anyone recommend portable SIMD library that provides a c/c++ API, works on Intel and AMD extensions and Visual Studio, GCC compatible. I'm looking to speed up things like scaling a 512x512 array of doubles. Vector dot products, matrix multiplication etc.
So far the only one I found is: http://simdx86.sourceforge.net/ but as the very first page says it doesn't compile on visual studio.
There's also Intel IPP which doesn't work on AMD from what I gather. And there's Framewave from AMD, but I was having some problems compiling and linking their library and their forums are completely dead. Anyone managed to use Framewave anywhere?
Thanks.
Since you mention high-level operations on matrices and vectors, ATLAS, Intel's MKL, PLASMA, and FLAME may be of interest.
Some C++ matrix math libraries include uBLAS from Boost, Armadillo, Eigen, IT++, and Newmat. The POOMA library probably also includes some of these things. This question also refers to MTL.
If you're looking for lower-level portability primitives, a colleague of mine has developed a wrapper around SSE2, Altivec, VSX, Larrabee, and Cell SPE vector operations. It can be found in our source repository, but its licensing (academic) may not be appropriate if you want to distribute it as part of your work. It is also still under significant development to cover the range of application needs that it's targeted at.
Check out macstl: http://www.pixelglow.com/macstl/
Try liboil or the related ORC. Especially ORC is interesting; it implements a high-level assembly language that is compiled into architecture specific code. Pretty sophisticated, much more so than a simple wrapper library.
Eigen is an MPL2-licensed header-only C++ library that has vector / matrix math that is optimized for SSE, Neon, and Altivec. They have more more sophisticated math operations in their add-on modules.
If you don't mind getting down and dirty with assembler then you can always use the intrinsic functions for all the SIMD instructions. They will be processor specific, i.e. SSE4 intrinsics will only run on SSE4 enabled CPUs and it's up to you to make sure the extensions are there.
There is a good article here about applying SIMD.
You could, however, use a compiler that generates SIMD code for you without any external libraries. VectorC is supposed to be good although I've never used it personally. It doesn't require any special libraries as far as I know, it just spots those bits of source code that can benfit from SIMD and compiles to whatever level of SSE you specify.