I'm trying to define my own reduction for vectors of complex<float>, following this answer to the question Reducing on array in OpenMP.
But the size of my vectors aren't fixed at compile time, so I'm not sure how to define the initializer for the vector in the declare reduction
pragma. That is, I can't just have
initializer( omp_priv=TComplexVector(10,0) )
But the initializer is needed for vectors.
How can I pass the initializer clause the size of the vector I need at run time? What I have so far is below:
typedef std::vector<complex<float>> TCmplxVec;
void ComplexAdd(TCmplxVec & x,TCmplxVec & y){
for (int i=0;i<x.size();i++)
{
x.real()+= y.real();
//... same for imaginary part and other operations
}
}
#pragma omp declare reduction(AddCmplx: TCmplxVec: \
ComplexAdd(&omp_out, &omp_in)) initializer( \
omp_priv={TCmplxVec(**here I want a variable length**,0} )
void DoSomeOperation ()
{
//TCmplxVec vec is empty and anotherVec not
//so each thread runs the inner loop serially
#pragma omp parallel for reduction(AddCmplx: vec)
for ( n=0 ; n<10 ; ++n )
{
for (m=0; m<=someLength; ++m){
vec[m] += anotherVec[m+someOffset dependend on n and else];
}
}
}
You have to dig a little bit to find it online right now, but in section 2.15 of the OpenMP Standard, where user-declared reductions are discussed, you'll find that "The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced."
So you could use initializer (omp_priv=TCmplxVec(omp_orig.size(),0))
, or just initalizer ( omp_priv(omp_orig) )
to initialize the vector in the reduction.
So the following works (note that you don't need to write your own routine; you can use std::transform and std::plus to add your vectors; you could also use std::valarray rather than vectors, depending on how you use them, which has operator+ already defined):
#include <complex>
#include <vector>
#include <algorithm>
#include <functional>
#include <iostream>
#include <omp.h>
typedef std::vector< std::complex<float> > TCmplxVec;
#pragma omp declare reduction( + : TCmplxVec : \
std::transform(omp_in.begin( ), omp_in.end( ), \
omp_out.begin( ), omp_out.begin( ), \
std::plus< std::complex<float> >( )) ) \
initializer (omp_priv(omp_orig))
int main(int argc, char *argv[]) {
int size;
if (argc < 2)
size = 10;
else
size = atoi(argv[1]);
TCmplxVec result(size,0);
#pragma omp parallel reduction( + : result )
{
int tid=omp_get_thread_num();
for (int i=0; i<std::min(tid+1,size); i++)
result[i] += tid;
}
for (int i=0; i<size; i++)
std::cout << i << "\t" << result[i] << std::endl;
return 0;
}
Running this gives
$ OMP_NUM_THREADS=1 ./reduction 8
0 (0,0)
1 (0,0)
2 (0,0)
3 (0,0)
4 (0,0)
5 (0,0)
6 (0,0)
7 (0,0)
$ OMP_NUM_THREADS=4 ./reduction 8
0 (6,0)
1 (6,0)
2 (5,0)
3 (3,0)
4 (0,0)
5 (0,0)
6 (0,0)
7 (0,0)
$ OMP_NUM_THREADS=8 ./reduction 8
0 (28,0)
1 (28,0)
2 (27,0)
3 (25,0)
4 (22,0)
5 (18,0)
6 (13,0)
7 (7,0)