Recursive Fib with Threads, Segmentation Fault?

2019-01-15 21:25发布

问题:

Any ideas why it works fine for values like 0, 1, 2, 3, 4... and seg faults for values like >15? #include #include #include

void *fib(void *fibToFind);

main(){
pthread_t mainthread;

long fibToFind = 15;
long finalFib;

pthread_create(&mainthread,NULL,fib,(void*) fibToFind);

pthread_join(mainthread,(void*)&finalFib);

printf("The number is: %d\n",finalFib);
}


void *fib(void *fibToFind){
long retval;

long newFibToFind = ((long)fibToFind);

long returnMinusOne;
long returnMinustwo;

pthread_t minusone;
pthread_t minustwo;

if(newFibToFind == 0 || newFibToFind == 1)
return newFibToFind;

else{
long newFibToFind1 = ((long)fibToFind) - 1;
long newFibToFind2 = ((long)fibToFind) - 2;

pthread_create(&minusone,NULL,fib,(void*) newFibToFind1);
pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2);

pthread_join(minusone,(void*)&returnMinusOne);
pthread_join(minustwo,(void*)&returnMinustwo);

return returnMinusOne + returnMinustwo;

}

}

回答1:

Runs out of memory (out of space for stacks), or valid thread handles?

You're asking for an awful lot of threads, which require lots of stack/context. Windows (and Linux) have a stupid "big [contiguous] stack" idea.

From the documentation on pthreads_create: "On Linux/x86-32, the default stack size for a new thread is 2 megabytes." If you manufacture 10,000 threads, you need 20 Gb of RAM. I built a version of OP's program, and it bombed with some 3500 (p)threads on Windows XP64.

See this SO thread for more details on why big stacks are a really bad idea: Why are stack overflows still a problem?

If you give up on big stacks, and implement a parallel language with heap allocation for activation records (our PARLANSE is one of these) the problem goes away.

Here's the first (sequential) program we wrote in PARLANSE:

(define fibonacci_argument 45)

(define fibonacci
   (lambda(function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? 1)
           ?
         (+ (fibonacci (-- ?))
              (fibonacci (- ? 2))
           )+
   )ifthenelse  
   )lambda
 )define

Here's an execution run on an i7:

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonaccisequential
 Starting Sequential Fibonacci(45)...Runtime: 33.752067 seconds
 Result: 1134903170

Here's the second, which is parallel:

(define coarse_grain_threshold 30) ; technology constant: tune to amortize fork overhead across lots of work

(define parallel_fibonacci
   (lambda (function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? coarse_grain_threshold)
           (fibonacci ?)
           (let (;; [n natural ] [m natural ]  )
                   (value (|| (= m (parallel_fibonacci (-- ?)) )=
                              (= n (parallel_fibonacci (- ? 2)) )=
                          )||
                          (+ m n)
                   )value
           )let
       )ifthenelse  
   )lambda
)define

Making the parallelism explicit makes the programs a lot easier to write, too.

The parallel version we test by calling (parallel_fibonacci 45). Here is the execution run on the same i7 (which arguably has 8 processors, but it is really 4 processors hyperthreaded so it really isn't quite 8 equivalent CPUs):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallelcoarse
Parallel Coarse-grain Fibonacci(45) with cutoff 30...Runtime: 5.511126 seconds
Result: 1134903170

A speedup near 6+, not bad for not-quite-8 processors. One of the other answers to this question ran the pthreads version; it took "a few seconds" (to blow up) computing Fib(18), and this is 5.5 seconds for Fib(45). This tells you pthreads is a fundamentally bad way to do lots of fine grain parallelism, because it has really, really high forking overhead. (PARLANSE is designed to minimize that forking overhead).

Here's what happens if you set the technology constant to zero (forks on every call to fib):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallel
Starting Parallel Fibonacci(45)...Runtime: 15.578779 seconds
Result: 1134903170

You can see that amortizing fork overhead is a good idea, even if you have fast forks.

Fib(45) produces a lot of grains. Heap allocation of activation records solves the OP's first-order problem (thousands of pthreads each with 1Mb of stack burns gigabytes of RAM).

But there's a second order problem: 2^45 PARLANSE "grains" will burn all your memory too just keeping track of the grains even if your grain control block is tiny. So it helps to have a scheduler that throttles forks once you have "a lot" (for some definition of "a lot" significantly less that 2^45) grains to prevent the explosion of parallelism from swamping the machine with "grain" tracking data structures. It has to unthrottle forks when the number of grains falls below a threshold too, to make sure there is always lots of logical, parallel work for the physical CPUs to do.



回答2:

You are not checking for errors - in particular, from pthread_create(). When pthread_create() fails, the pthread_t variable is left undefined, and the subsequent pthread_join() may crash.

If you do check for errors, you will find that pthread_create() is failing. This is because you are trying to generate almost 2000 threads - with default settings, this would require 16GB of thread stacks to be allocated alone.

You should revise your algorithm so that it does not generate so many threads.



回答3:

I tried to run your code, and came across several surprises:

printf("The number is: %d\n", finalFib);

This line has a small error: %d means printf expects an int, but is passed a long int. On most platforms this is the same, or will have the same behavior anyways, but pedantically speaking (or if you just want to stop the warning from coming up, which is a very noble ideal too), you should use %ld instead, which will expect a long int.

Your fib function, on the other hand, seems non-functional. Testing it on my machine, it doesn't crash, but it yields 1047, which is not a Fibonacci number. Looking closer, it seems your program is incorrect on several aspects:

void *fib(void *fibToFind)
{
    long retval; // retval is never used

    long newFibToFind = ((long)fibToFind);

    long returnMinusOne; // variable is read but never initialized
    long returnMinustwo; // variable is read but never initialized

    pthread_t minusone; // variable is never used (?)
    pthread_t minustwo; // variable is never used

    if(newFibToFind == 0 || newFibToFind == 1)
        // you miss a cast here (but you really shouldn't do it this way)
        return newFibToFind;        
    else{
        long newFibToFind1 = ((long)fibToFind) - 1; // variable is never used
        long newFibToFind2 = ((long)fibToFind) - 2; // variable is never used
        // reading undefined variables (and missing a cast)
        return returnMinusOne + returnMinustwo;

    }
}

Always take care of compiler warnings: when you get one, usually, you really are doing something fishy.

Maybe you should revise the algorithm a little: right now, all your function does is returning the sum of two undefined values, hence the 1047 I got earlier.

Implementing the Fibonacci suite using a recursive algorithm means you need to call the function again. As others noted, it's quite an inefficient way of doing it, but it's easy, so I guess all computer science teachers use it as an example.

The regular recursive algorithm looks like this:

int fibonacci(int iteration)
{
    if (iteration == 0 || iteration == 1)
        return 1;

    return fibonacci(iteration - 1) + fibonacci(iteration - 2);
}

I don't know to which extent you were supposed to use threads—just run the algorithm on a secondary thread, or create new threads for each call? Let's assume the first for now, since it's a lot more straightforward.

Casting integers to pointers and vice-versa is a bad practice because if you try to look at things at a higher level, they should be widely different. Integers do maths, and pointers resolve memory addresses. It happens to work because they're represented the same way, but really, you shouldn't do this. Instead, you might notice that the function called to run your new thread accepts a void* argument: we can use it to convey both where the input is, and where the output will be.

So building upon my previous fibonacci function, you could use this code as the thread main routine:

void* fibonacci_offshored(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;
    *pointer_to_number = fibonacci(input);
    return NULL;
}

It expects a pointer to an integer, and takes from it its input, then writes it output there.1 You would then create the thread like that:

int main()
{
    int value = 15;
    pthread_t thread;

    // on input, value should contain the number of iterations;
    // after the end of the function, it will contain the result of
    //  the fibonacci function
    int result = pthread_create(&thread, NULL, fibonacci_offshored, &value);

    // error checking is important! try to crash gracefully at the very least
    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }

    if (pthread_join(thread, NULL)
    {
        perror("pthread_join");
        return 1;
    }

    // now, value contains the output of the fibonacci function
    // (note that value is an int, so just %d is fine)
    printf("The value is %d\n", value);
    return 0;
}

If you need to call the Fibonacci function from new distinct threads (please note: that's not what I'd advise, and others seem to agree with me; it will just blow up for a sufficiently large amount of iterations), you'll first need to merge the fibonacci function with the fibonacci_offshored function. It will considerably bulk it up, because dealing with threads is heavier than dealing with regular functions.

void* threaded_fibonacci(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;

    if (input == 0 || input == 1)
    {
        *pointer_to_number = 1;
        return NULL;
    }

    // we need one argument per thread
    int minus_one_number = input - 1;
    int minus_two_number = input - 2;

    pthread_t minus_one;
    pthread_t minus_two;
    // don't forget to check! especially that in a recursive function where the
    // recursion set actually grows instead of shrinking, you're bound to fail
    // at some point
    if (pthread_create(&minus_one, NULL, threaded_fibonacci, &minus_one_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }
    if (pthread_create(&minus_two, NULL, threaded_fibonacci, &minus_two_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_one, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_two, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    *pointer_to_number = minus_one_number + minus_two_number;
    return NULL;
}

Now that you have this bulky function, adjustments to your main function are going to be quite easy: just change the reference to fibonacci_offshored to threaded_fibonacci.

int main()
{
    int value = 15;
    pthread_t thread;

    int result = pthread_create(&thread, NULL, threaded_fibonacci, &value);

    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }
    pthread_join(thread, NULL);

    printf("The value is %d\n", value);
    return 0;
}

You might have been told that threads speed up parallel processes, but there's a limit somewhere where it's more expensive to set up the thread than run its contents. This is a very good example of such a situation: the threaded version of the program runs much, much slower than the non-threaded one.

For educational purposes, this program runs out of threads on my machine when the number of desired iterations is 18, and takes a few seconds to run. By comparison, using an iterative implementation, we never run out of threads, and we have our answer in a matter of milliseconds. It's also considerably simpler. This would be a great example of how using a better algorithm fixes many problems.

Also, out of curiosity, it would be interesting to see if it crashes on your machine, and where/how.

1. Usually, you should try to avoid to change the meaning of a variable between its value on input and its value after the return of the function. For instance, here, on input, the variable is the number of iterations we want; on output, it's the result of the function. Those are two very different meanings, and that's not really a good practice. I didn't feel like using dynamic allocations to return a value through the void* return value.