OpenMP benchmark: Am I doing it right?

2019-09-13 21:15发布

问题:

I made a program which calculates the fibonacci sequence. I executed it with different numbers of threads (eg. 1, 2, 10) but the execution time remained almost the same (about 0.500 seconds).

I'm using CodeBlocks on Ubuntu and the GNU GCC compiler. In CodeBlocks I linked the library gomp and defined the flag -fopenmp for the compiler.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int i, n=1000, a[n];
    omp_set_num_threads(4);

    for(i=0; i<n; i++)
    {
        a[i] = 1 + (rand() % ( 50 - 1 + 1 ) );
    }

    fibo(n, a);

    return 0;
}

void fibo(int sizeN, int n[])
{
    int i;

    #pragma omp parallel for
    for(i=0; i<sizeN; i++)
    {
    int a = 0, b = 1, next, c;
        printf("n = %i\n", n[i]);
        for (c=0; c<=n[i]; c++)
        {
            if (c <= 1)
            {
                next = c;
            }
            else
            {
                next = a + b;
                a = b;
                b = next;
            }
            printf("%d\n",next);
        }
    }
}

Does anybody have an idea?
How can I make sure that OpenMP really works (is installed)?

回答1:

Remove both printf statements. Your program is spending more time sending text to the standard output than computing the numbers. Since the standard output is basically serial, your program serialises in the printf statements. Not to mention the overhead of printf itself - it has to parse the format string, convert the integer value to a string and then send that to the stdout stream.

Observe those measurement timings (n = 10000):

OMP_NUM_THREADS=1 ./fibo.exe  0.10s user 0.42s system 40% cpu 1.305 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.24s user 1.01s system 95% cpu 1.303 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.36s user 1.87s system 163% cpu 1.360 total
                                         ^^^^^^^^^^^^

I've removed the call to omp_set_num_threads() and use OMP_NUM_THREADS instead, which allows to run the program with varying number of threads without recompiling the source. Note that the program spends consistently about 4x more time in system mode than in user mode. This is the overhead of that text output.

Now compare the same with both printf statements commented out (note that I had to increase n to 1000000 in order to get meaningful results from time):

OMP_NUM_THREADS=1 ./fibo.exe  0.20s user 0.00s system 99% cpu 0.208 total
                                                              ^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.21s user 0.00s system 179% cpu 0.119 total
                                                               ^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.20s user 0.01s system 295% cpu 0.071 total
                                                               ^^^^^^^^^^^

Now the system time stays almost zero and the program is 1,75x faster with 2 threads and 2,93x faster with 4 threads. The speed-up is not linear since there is a slight imbalance in the work distribution among the threads. If the array is filled with constant values, then the speed-up is almost linear.



回答2:

Try asking for larger constant (not random) values of fibbonacci, and larger values of sizeN. Then, you should test with the same values but using the serial implementation (remove the #pragmas and compile again).

Also, you should have more than one core in your system to see some benefits from parallelism.

Finally, if it compiles via -fopenmp then OpenMP is installed.