Any ideas why it works fine for values like 0, 1, 2, 3, 4... and seg faults for values like >15? #include #include #include
void *fib(void *fibToFind);
main(){
pthread_t mainthread;
long fibToFind = 15;
long finalFib;
pthread_create(&mainthread,NULL,fib,(void*) fibToFind);
pthread_join(mainthread,(void*)&finalFib);
printf("The number is: %d\n",finalFib);
}
void *fib(void *fibToFind){
long retval;
long newFibToFind = ((long)fibToFind);
long returnMinusOne;
long returnMinustwo;
pthread_t minusone;
pthread_t minustwo;
if(newFibToFind == 0 || newFibToFind == 1)
return newFibToFind;
else{
long newFibToFind1 = ((long)fibToFind) - 1;
long newFibToFind2 = ((long)fibToFind) - 2;
pthread_create(&minusone,NULL,fib,(void*) newFibToFind1);
pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2);
pthread_join(minusone,(void*)&returnMinusOne);
pthread_join(minustwo,(void*)&returnMinustwo);
return returnMinusOne + returnMinustwo;
}
}
I tried to run your code, and came across several surprises:
This line has a small error:
%d
meansprintf
expects anint
, but is passed along int
. On most platforms this is the same, or will have the same behavior anyways, but pedantically speaking (or if you just want to stop the warning from coming up, which is a very noble ideal too), you should use%ld
instead, which will expect along int
.Your
fib
function, on the other hand, seems non-functional. Testing it on my machine, it doesn't crash, but it yields1047
, which is not a Fibonacci number. Looking closer, it seems your program is incorrect on several aspects:Always take care of compiler warnings: when you get one, usually, you really are doing something fishy.
Maybe you should revise the algorithm a little: right now, all your function does is returning the sum of two undefined values, hence the 1047 I got earlier.
Implementing the Fibonacci suite using a recursive algorithm means you need to call the function again. As others noted, it's quite an inefficient way of doing it, but it's easy, so I guess all computer science teachers use it as an example.
The regular recursive algorithm looks like this:
I don't know to which extent you were supposed to use threads—just run the algorithm on a secondary thread, or create new threads for each call? Let's assume the first for now, since it's a lot more straightforward.
Casting integers to pointers and vice-versa is a bad practice because if you try to look at things at a higher level, they should be widely different. Integers do maths, and pointers resolve memory addresses. It happens to work because they're represented the same way, but really, you shouldn't do this. Instead, you might notice that the function called to run your new thread accepts a
void*
argument: we can use it to convey both where the input is, and where the output will be.So building upon my previous
fibonacci
function, you could use this code as the thread main routine:It expects a pointer to an integer, and takes from it its input, then writes it output there.1 You would then create the thread like that:
If you need to call the Fibonacci function from new distinct threads (please note: that's not what I'd advise, and others seem to agree with me; it will just blow up for a sufficiently large amount of iterations), you'll first need to merge the
fibonacci
function with thefibonacci_offshored
function. It will considerably bulk it up, because dealing with threads is heavier than dealing with regular functions.Now that you have this bulky function, adjustments to your
main
function are going to be quite easy: just change the reference tofibonacci_offshored
tothreaded_fibonacci
.You might have been told that threads speed up parallel processes, but there's a limit somewhere where it's more expensive to set up the thread than run its contents. This is a very good example of such a situation: the threaded version of the program runs much, much slower than the non-threaded one.
For educational purposes, this program runs out of threads on my machine when the number of desired iterations is 18, and takes a few seconds to run. By comparison, using an iterative implementation, we never run out of threads, and we have our answer in a matter of milliseconds. It's also considerably simpler. This would be a great example of how using a better algorithm fixes many problems.
Also, out of curiosity, it would be interesting to see if it crashes on your machine, and where/how.
1. Usually, you should try to avoid to change the meaning of a variable between its value on input and its value after the return of the function. For instance, here, on input, the variable is the number of iterations we want; on output, it's the result of the function. Those are two very different meanings, and that's not really a good practice. I didn't feel like using dynamic allocations to return a value through the
void*
return value.You are not checking for errors - in particular, from
pthread_create()
. Whenpthread_create()
fails, thepthread_t
variable is left undefined, and the subsequentpthread_join()
may crash.If you do check for errors, you will find that
pthread_create()
is failing. This is because you are trying to generate almost 2000 threads - with default settings, this would require 16GB of thread stacks to be allocated alone.You should revise your algorithm so that it does not generate so many threads.
Runs out of memory (out of space for stacks), or valid thread handles?
You're asking for an awful lot of threads, which require lots of stack/context. Windows (and Linux) have a stupid "big [contiguous] stack" idea.
From the documentation on pthreads_create: "On Linux/x86-32, the default stack size for a new thread is 2 megabytes." If you manufacture 10,000 threads, you need 20 Gb of RAM. I built a version of OP's program, and it bombed with some 3500 (p)threads on Windows XP64.
See this SO thread for more details on why big stacks are a really bad idea: Why are stack overflows still a problem?
If you give up on big stacks, and implement a parallel language with heap allocation for activation records (our PARLANSE is one of these) the problem goes away.
Here's the first (sequential) program we wrote in PARLANSE:
Here's an execution run on an i7:
Here's the second, which is parallel:
Making the parallelism explicit makes the programs a lot easier to write, too.
The parallel version we test by calling (parallel_fibonacci 45). Here is the execution run on the same i7 (which arguably has 8 processors, but it is really 4 processors hyperthreaded so it really isn't quite 8 equivalent CPUs):
A speedup near 6+, not bad for not-quite-8 processors. One of the other answers to this question ran the pthreads version; it took "a few seconds" (to blow up) computing Fib(18), and this is 5.5 seconds for Fib(45). This tells you pthreads is a fundamentally bad way to do lots of fine grain parallelism, because it has really, really high forking overhead. (PARLANSE is designed to minimize that forking overhead).
Here's what happens if you set the technology constant to zero (forks on every call to fib):
You can see that amortizing fork overhead is a good idea, even if you have fast forks.
Fib(45) produces a lot of grains. Heap allocation of activation records solves the OP's first-order problem (thousands of pthreads each with 1Mb of stack burns gigabytes of RAM).
But there's a second order problem: 2^45 PARLANSE "grains" will burn all your memory too just keeping track of the grains even if your grain control block is tiny. So it helps to have a scheduler that throttles forks once you have "a lot" (for some definition of "a lot" significantly less that 2^45) grains to prevent the explosion of parallelism from swamping the machine with "grain" tracking data structures. It has to unthrottle forks when the number of grains falls below a threshold too, to make sure there is always lots of logical, parallel work for the physical CPUs to do.