I'm running a completely parallel matrix multiplication program on a Mac Pro with a Xeon processor. I create 8 threads (as many threads as cores), and there are no shared writing issues (no writing to the same locations). For some reason, my use of pthread_create
and pthread_join
is about twice as slow as using #pragma openmp
.
There are no other differences in anything... same compile options, same number of threads in both cases, same code (except the pragma/pthread
portions obviously), etc.
And the loops are very big -- I'm not parallelizing small loops.
(I can't really post the code because it's school work.)
Why might this be happening? Doesn't OpenMP use POSIX threads itself? How can it be faster?