I made a program which calculates the fibonacci sequence. I executed it with different numbers of threads (eg. 1, 2, 10) but the execution time remained almost the same (about 0.500 seconds).
I'm using CodeBlocks on Ubuntu and the GNU GCC compiler. In CodeBlocks I linked the library gomp
and defined the flag -fopenmp
for the compiler.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i, n=1000, a[n];
omp_set_num_threads(4);
for(i=0; i<n; i++)
{
a[i] = 1 + (rand() % ( 50 - 1 + 1 ) );
}
fibo(n, a);
return 0;
}
void fibo(int sizeN, int n[])
{
int i;
#pragma omp parallel for
for(i=0; i<sizeN; i++)
{
int a = 0, b = 1, next, c;
printf("n = %i\n", n[i]);
for (c=0; c<=n[i]; c++)
{
if (c <= 1)
{
next = c;
}
else
{
next = a + b;
a = b;
b = next;
}
printf("%d\n",next);
}
}
}
Does anybody have an idea?
How can I make sure that OpenMP really works (is installed)?
Try asking for larger constant (not random) values of fibbonacci, and larger values of
sizeN
. Then, you should test with the same values but using the serial implementation (remove the#pragma
s and compile again).Also, you should have more than one core in your system to see some benefits from parallelism.
Finally, if it compiles via
-fopenmp
then OpenMP is installed.Remove both
printf
statements. Your program is spending more time sending text to the standard output than computing the numbers. Since the standard output is basically serial, your program serialises in theprintf
statements. Not to mention the overhead ofprintf
itself - it has to parse the format string, convert the integer value to a string and then send that to thestdout
stream.Observe those measurement timings (
n = 10000
):I've removed the call to
omp_set_num_threads()
and useOMP_NUM_THREADS
instead, which allows to run the program with varying number of threads without recompiling the source. Note that the program spends consistently about 4x more time in system mode than in user mode. This is the overhead of that text output.Now compare the same with both
printf
statements commented out (note that I had to increasen
to1000000
in order to get meaningful results fromtime
):Now the system time stays almost zero and the program is 1,75x faster with 2 threads and 2,93x faster with 4 threads. The speed-up is not linear since there is a slight imbalance in the work distribution among the threads. If the array is filled with constant values, then the speed-up is almost linear.