I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:
typedef struct matrix {
float *data;
int rows;
int cols;
int leading_dim;
} matrix;
I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row
for retrieving the correct indices.
for(int row = 0; row < tan_hd.rows; row++) {
float sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for(int column = 0; column < tan_hd.cols; column++) {
sum += tan_hd.data[column * tan_hd.rows + row];
}
printf("row %d: %f", row, sum);
}
Without the OpenMP pragma, the delivered result is correct and looks like this:
row 0: 8172539.500000 row 1: 8194582.000000
As soon as I add the #pragma omp...
as described above, a different (wrong) result is returned:
row 0: 8085544.000000 row 1: 8107186.000000
In my understanding, reduction(+:sum)
creates private copies of sum
for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum
again. What is it, that I am doing wrong?
I appreciate your suggestions!
Use the Kahan summation algorithm
By rewriting your code to implement it:
You can additionally switch all
float
todouble
to achieve a higher precision, but since your array is afloat
array, there should only be differences in the number of signficant numbers at the end.