my program manipulates STL vectors of integers but, from time to time, I need to calculate a few statistics on them. Therefore I use the GSL functions. To avoid copying the STL vector into a GSL vector, I create a GSL vector view, and give it to the GSL functions, as in this piece of code:
#include <iostream>
#include <vector>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_statistics.h>
using namespace std;
int main( int argc, char* argv[] )
{
vector<int> stl_v;
for( int i=0; i<5; ++i )
stl_v.push_back( i );
gsl_vector_int_const_view gsl_v = gsl_vector_int_const_view_array( &stl_v[0], stl_v.size() );
for( int i=0; i<stl_v.size(); ++i )
cout << "gsl_v_" << i << "=" << gsl_vector_int_get( &gsl_v.vector, i ) << endl;
cout << "mean=" << gsl_stats_mean( (double*) gsl_v.vector.data, 1, stl_v.size() ) << endl;
}
Once compiled (gcc -lstdc++ -lgsl -lgslcblas test.cpp), this code outputs this:
gsl_v_0=0
gsl_v_1=1
gsl_v_2=2
gsl_v_3=3
gsl_v_4=4
mean=5.73266e-310
The vector view is properly created but I don't understand why the mean is wrong (it should be equal to 10/5=2). Any idea? Thanks in advance.
Use the integer statistics functions:
Note the
gsl_stats_int_mean
instead ofgsl_stats_mean
.The cast to
double*
is very suspicious.Any time you are tempted to use a cast, think again. Then look for a way to do it without a cast (maybe by introducing a temporary variable if the conversion is implicit). Then think a third time before you cast.
Since the memory region does not actually contain
double
values, the code is simply interpreting the bit patterns there as if they represented doubles, with predictably undesired effects. Casting anint*
todouble*
is VERY different from casting each element of the array.Unless you're doing a lot of statistics considerably more complex than the mean, I'd ignore gsl and just use standard algorithms:
When/if using a statistical library is justified, your first choice should probably be to look for something else that's better designed (e.g., Boost Accumulators).
If you decide, for whatever reason, that you really need to use gsl, it looks like you'll have to copy your array of
int
s to an array ofdouble
s first, then use gsl on the result. This is obvious quite inefficient, especially if you're dealing with a lot of data -- thus the previous advice to use something else instead.Although I'm not familiar with GSL, the expression
(double*) gsl_v.vector.data
looks extremely suspicious. Are you sure it's correct toreinterpret_cast
that pointer to getdouble
data?According to http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html the
gsl_stats_mean
function takes an array ofdouble
. You're taking avector
of int and telling it to use the raw bytes asdouble
which isn't going to work right.You'll need to set up a temporary
vector
of double to pass in:EDIT: You could also use standard library algorithms to do the int mean yourself:
Casting to
double*
is messing up your data. It is not converting data intodouble
, but just usingint
binary data asdouble