OpenMP benchmark: Am I doing it right?

I made a program which calculates the fibonacci sequence. I executed it with different numbers of threads (eg. 1, 2, 10) but the execution time remained almost the same (about 0.500 seconds).

I'm using CodeBlocks on Ubuntu and the GNU GCC compiler. In CodeBlocks I linked the library gomp and defined the flag -fopenmp for the compiler.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int i, n=1000, a[n];
    omp_set_num_threads(4);

    for(i=0; i<n; i++)
    {
        a[i] = 1 + (rand() % ( 50 - 1 + 1 ) );
    }

    fibo(n, a);

    return 0;
}

void fibo(int sizeN, int n[])
{
    int i;

    #pragma omp parallel for
    for(i=0; i<sizeN; i++)
    {
    int a = 0, b = 1, next, c;
        printf("n = %i\n", n[i]);
        for (c=0; c<=n[i]; c++)
        {
            if (c <= 1)
            {
                next = c;
            }
            else
            {
                next = a + b;
                a = b;
                b = next;
            }
            printf("%d\n",next);
        }
    }
}

Does anybody have an idea?
How can I make sure that OpenMP really works (is installed)?

标签： c openmp benchmarking fibonacci

2条回答

我只想做你的唯一

2楼-- · 2019-09-13 21:27

Try asking for larger constant (not random) values of fibbonacci, and larger values of sizeN. Then, you should test with the same values but using the serial implementation (remove the #pragmas and compile again).

Also, you should have more than one core in your system to see some benefits from parallelism.

Finally, if it compiles via -fopenmp then OpenMP is installed.

0人赞添加讨论(0) 举报

SAY GOODBYE

3楼-- · 2019-09-13 21:36

Remove both printf statements. Your program is spending more time sending text to the standard output than computing the numbers. Since the standard output is basically serial, your program serialises in the printf statements. Not to mention the overhead of printf itself - it has to parse the format string, convert the integer value to a string and then send that to the stdout stream.

Observe those measurement timings (n = 10000):

OMP_NUM_THREADS=1 ./fibo.exe  0.10s user 0.42s system 40% cpu 1.305 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.24s user 1.01s system 95% cpu 1.303 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.36s user 1.87s system 163% cpu 1.360 total
                                         ^^^^^^^^^^^^

I've removed the call to omp_set_num_threads() and use OMP_NUM_THREADS instead, which allows to run the program with varying number of threads without recompiling the source. Note that the program spends consistently about 4x more time in system mode than in user mode. This is the overhead of that text output.

Now compare the same with both printf statements commented out (note that I had to increase n to 1000000 in order to get meaningful results from time):

OMP_NUM_THREADS=1 ./fibo.exe  0.20s user 0.00s system 99% cpu 0.208 total
                                                              ^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.21s user 0.00s system 179% cpu 0.119 total
                                                               ^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.20s user 0.01s system 295% cpu 0.071 total
                                                               ^^^^^^^^^^^

Now the system time stays almost zero and the program is 1,75x faster with 2 threads and 2,93x faster with 4 threads. The speed-up is not linear since there is a slight imbalance in the work distribution among the threads. If the array is filled with constant values, then the speed-up is almost linear.

0人赞添加讨论(0) 举报

OpenMP benchmark: Am I doing it right?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间