OpenMP parallel for reduction delivers wrong resul

I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:

typedef struct matrix {
  float *data;
  int rows;
  int cols;
  int leading_dim;
} matrix;

I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row for retrieving the correct indices.

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0;
    #pragma omp parallel for reduction(+:sum)
    for(int column = 0; column < tan_hd.cols; column++) {
        sum += tan_hd.data[column * tan_hd.rows + row];
    }
    printf("row %d: %f", row, sum);
}

Without the OpenMP pragma, the delivered result is correct and looks like this:

row 0: 8172539.500000 row 1: 8194582.000000

As soon as I add the #pragma omp... as described above, a different (wrong) result is returned:

row 0: 8085544.000000 row 1: 8107186.000000

In my understanding, reduction(+:sum) creates private copies of sum for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum again. What is it, that I am doing wrong?

I appreciate your suggestions!

标签： c parallel-processing openmp reduction

1条回答

家丑人穷心不美

2楼-- · 2019-05-28 16:23

Use the Kahan summation algorithm

It has the same algorithmic complexity as a naive summation
It will greatly increase the accuracy of a summation, without requiring you to switch data types to double.

By rewriting your code to implement it:

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0, c = 0.0;
    #pragma omp parallel for reduction(+:sum, +:c)
    for(int column = 0; column < tan_hd.cols; column++) {
        float y = tan_hd.data[column * tan_hd.rows + row] - c;
        float t = sum + y;
        c = (t - sum) - y;
        sum = t;
    }
    sum = sum - c;
    printf("row %d: %f", row, sum);
}

You can additionally switch all float to double to achieve a higher precision, but since your array is a float array, there should only be differences in the number of signficant numbers at the end.

0人赞添加讨论(0) 举报

OpenMP parallel for reduction delivers wrong resul

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间