Given a C string (array of characters terminating with a NULL character constant), we have to find the length of the string. Could you please suggest some ways to parallelize this for N number of threads of execution. I am having problem dividing into sub-problems as accessing a location of the array which is not present will give segmentation fault.
EDIT: I am not concerned that doing this task in parallel may have much greater overhead or not. Just want to know if this can be done (using something like openmp etc.)
You could do something ugly like this in Windows enclosing unsafe memory reads in a SEH __try block:
Output:
I think it may be better to use MMX/SSE instructions to speed up the code in a somewhat parallel way.
EDIT: This may be not a very good idea on Windows after all, see Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly.
No it can't. Because each step requires the previous state to be known (did we encounter a null on the previous char). You can only safely check 1 character at a time.
Imagine you are turning over rocks and you MUST stop at one with white paint underneath (null) or you will die (aka seg fault etc).
You can't have people "working ahead" of each other, as the white paint rock might be in between.
Having multiple people (threads/processes) would simply be them taking turns being the one turning over the next rock. They would never be turning over rocks at the same time as each other.
Let me acknowledge this,
Following code has been written using C# and not C. You can associate the idea what I am trying to articulate. And most of the content are from a Parallel Pattern (was a draft document by Microsoft on parallel approach)
To do the best static partitioning possible, you need to be able to accurately predict ahead of time how long all the iterations will take. That’s rarely feasible, resulting in a need for a more dynamic partitioning, where the system can adapt to changing workloads quickly. We can address this by shifting to the other end of the partitioning tradeoffs spectrum, with as much load-balancing as possible.
To do that, rather than pushing to each of the threads a given set of indices to process, we can have the threads compete for iterations. We employ a pool of the remaining iterations to be processed, which initially starts filled with all iterations. Until all of the iterations have been processed, each thread goes to the iteration pool, removes an iteration value, processes it, and then repeats. In this manner, we can achieve in a greedy fashion an approximation for the optimal level of load-balancing possible (the true optimum could only be achieved with a priori knowledge of exactly how long each iteration would take). If a thread gets stuck processing a particular long iteration, the other threads will compensate by processing work from the pool in the meantime. Of course, even with this scheme you can still find yourself with a far from optimal partitioning (which could occur if one thread happened to get stuck with several pieces of work significantly larger than the rest), but without knowledge of how much processing time a given piece of work will require, there’s little more that can be done.
Here’s an example implementation that takes load-balancing to this extreme. The pool of iteration values is maintained as a single integer representing the next iteration available, and the threads involved in the processing “remove items” by atomically incrementing this integer:
Do you know the maximum size of that char array? If so, you could do a parallel search in different junks and return the index of the terminator with smallest index. Hence you are then only working on allocated memory, you cannot get segfaults.
Of course this is not as sophisticated as s_nairs answer but pretty straight forward. example:
It's probably not even worth trying. If string is short, overhead will be greater than gain in processing speed. If string is really long, the speed will probably be limited by speed of memory, not by CPU processing speed.
I'd say with just a standard C-string this can not be done. However, if you can define a personal termination string with as many characters as processes - it's straight forward.