I noticed that my boost mutiarrays were performing very badly compared to STL Vector. I came upon this question asked earlier, where the most liked answer stated that
1) Boost is nearly as fast as native array
2) You need to change the order in which you access your data elements to get the best performance out of Boost MultiArray. Also, that you need to run in Release mode, and not Debug.
Well, I did all that, and yet the performance of my MultiArrays is pretty shabby.
I am posting my code here :
A) WITH DEFAULT ORDERING
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <stdio.h>
#include <conio.h>
#include <iostream>
int main(int argc, char* argv[])
{
const int X_SIZE = 400;
const int Y_SIZE = 400;
const int ITERATIONS = 500;
unsigned int startTime = 0;
unsigned int endTime = 0;
// Create the boost array
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
// Create the native array
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
//------------------Measure boost----------------------------------------------
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE; ++y)
{
for (int x = 0; x < X_SIZE; ++x)
{
boostMatrix[x][y] *= 2.345;
}
}
}
endTime = ::GetTickCount();
printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
//------------------Measure native-----------------------------------------------
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE; ++y)
{
for (int x = 0; x < X_SIZE; ++x)
{
nativeMatrix[x + (y * X_SIZE)] *= 2.345;
}
}
}
endTime = ::GetTickCount();
printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
return 0;
}
B) WITH INVERTED ORDERING
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <stdio.h>
#include <conio.h>
#include <iostream>
int main(int argc, char* argv[])
{
const int X_SIZE = 400;
const int Y_SIZE = 400;
const int ITERATIONS = 500;
unsigned int startTime = 0;
unsigned int endTime = 0;
// Create the boost array
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
// Create the native array
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
//------------------Measure boost----------------------------------------------
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] *= 2.345;
}
}
}
endTime = ::GetTickCount();
printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
//------------------Measure native-----------------------------------------------
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
nativeMatrix[x + (y * X_SIZE)] *= 2.345;
}
}
}
endTime = ::GetTickCount();
printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
return 0;
}
In every possible permutation, my benchmarks are approximately the same :
1) For Native code : 0.15s
2) For Boost MultiArray : 1.0s
I am using Visual Studio 2010.
My question is : given that I am using Visual Studio, how to get good performance from Boost MultiArrays?
UPDATE :
I switched over to Visual Studio 2013. There, I enabled the Qvec-report2 compiler switch. And very interestingly, when I compiled, I started receiving an info message saying that the compiler was failing to vectorize. Here is a sample info message which looks almost like a warning. I received several such messages for the simplest of code.
--- Analyzing function: void __cdecl `vector constructor iterator'(void * __ptr64,unsigned __int64,int,void * __ptr64 (__cdecl*)(void * __ptr64)) 1> D:\Workspace\test\Scrap\Scrap\Source.cpp : info C5002: loop not vectorized due to reason '1301'
I think this is a major clue as to why Boost multiarrays are performing slower on my Visual Studio while they perform alright on GCC. Given this extra information, can you please think of a way to resolve the problem?
@Admins : Kindly unmark my question as previously answered. I have made a major edit.