How is the fork/join framework better than a threa

2019-01-04 15:49发布

What are the benefits of using the new fork/join framework over just simply splitting the big task into N subtasks in the beginning, sending them to a cached thread pool (from Executors) and waiting for each task to complete? I fail to see how using the fork/join abstraction simplifies the problem or makes the solution more efficient from what we've had for years now.

For example, the parallelized blurring algorithm in the tutorial example could be implemented like this:

public class Blur implements Runnable {
    private int[] mSource;
    private int mStart;
    private int mLength;
    private int[] mDestination;

    private int mBlurWidth = 15; // Processing window size, should be odd.

    public ForkBlur(int[] src, int start, int length, int[] dst) {
        mSource = src;
        mStart = start;
        mLength = length;
        mDestination = dst;
    }

    public void run() {
        computeDirectly();
    }

    protected void computeDirectly() {
        // As in the example, omitted for brevity
    }
}

Split in the beginning and send tasks to a thread pool:

// source image pixels are in src
// destination image pixels are in dst
// threadPool is a (cached) thread pool

int maxSize = 100000; // analogous to F-J's "sThreshold"
List<Future> futures = new ArrayList<Future>();

// Send stuff to thread pool:
for (int i = 0; i < src.length; i+= maxSize) {
    int size = Math.min(maxSize, src.length - i);
    ForkBlur task = new ForkBlur(src, i, size, dst);
    Future f = threadPool.submit(task);
    futures.add(f);
}

// Wait for all sent tasks to complete:
for (Future future : futures) {
    future.get();
}

// Done!

The tasks go to the thread pool's queue, from which they're executed as worker threads become available. As long as the splitting is granular enough (to avoid having to particularly wait for the last task) and the thread pool has enough (at least N of processors) threads, all processors are working at full speed until the whole computation is done.

Am I missing something? What's the added value of using the fork/join framework?

10条回答
淡お忘
2楼-- · 2019-01-04 16:31

Fork/join is different from a thread pool because it implements work stealing. From Fork/Join

As with any ExecutorService, the fork/join framework distributes tasks to worker threads in a thread pool. The fork/join framework is distinct because it uses a work-stealing algorithm. Worker threads that run out of things to do can steal tasks from other threads that are still busy.

Say you have two threads, and 4 tasks a, b, c, d which take 1, 1, 5 and 6 seconds respectively. Initially, a and b are assigned to thread 1 and c and d to thread 2. In a thread pool, this would take 11 seconds. With fork/join, thread 1 finishes and can steal work from thread 2, so task d would end up being executed by thread 1. Thread 1 executes a, b and d, thread 2 just c. Overall time: 8 seconds, not 11.

EDIT: As Joonas points out, tasks are not necessarily pre-allocated to a thread. The idea of fork/join is that a thread can choose to split a task into multiple sub-pieces. So to restate the above:

We have two tasks (ab) and (cd) which take 2 and 11 seconds respectively. Thread 1 starts to execute ab and split it into two sub-tasks a & b. Similarly with thread 2, it splits into two sub-tasks c & d. When thread 1 has finished a & b, it can steal d from thread 2.

查看更多
一夜七次
3楼-- · 2019-01-04 16:33

If the problem is such that we have to wait for other threads to complete(as in case of sorting of array or sum of array), fork join should be used, as Executor(Executors.newFixedThreadPool(2)) will choke due to limited number of threads. The forkjoin pool will create more threads in this case to coverup for the blocked thread to maintain same parallelism

Source: http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

The problem with the executors for implementing divide and conquer algorithms is not related to creating subtasks, because a Callable is free to submit a new subtask to its executor and wait for its result in a synchronous or asynchronous fashion. The issue is that of parallelism: When a Callable waits for the result of another Callable, it is put in a waiting state, thus wasting an opportunity to handle another Callable queued for execution.

The fork/join framework added to the java.util.concurrent package in Java SE 7 through Doug Lea’s efforts fills that gap

Source: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinPool.html

The pool attempts to maintain enough active (or available) threads by dynamically adding, suspending, or resuming internal worker threads, even if some tasks are stalled waiting to join others. However, no such adjustments are guaranteed in the face of blocked IO or other unmanaged synchronization

public int getPoolSize() Returns the number of worker threads that have started but not yet terminated. The result returned by this method may differ from getParallelism() when threads are created to maintain parallelism when others are cooperatively blocked.

查看更多
唯我独甜
4楼-- · 2019-01-04 16:33

You would be amazed on ForkJoin performance in application like crawler. here is the best tutorial you would learn from.

Fork/Join's logic is very simple: (1) separate (fork) each large task into smaller tasks; (2) process each task in a separate thread (separating those into even smaller tasks if necessary); (3) join the results.

查看更多
姐就是有狂的资本
5楼-- · 2019-01-04 16:35

In this example Fork/Join adds no value because forking is not needed and the workload is evenly split across worker threads. Fork/Join only adds overhead.

Here is a nice article on the subject. Quote:

Overall, we can say that the ThreadPoolExecutor is to be preferred where the workload is evenly split across worker threads. To be able to guarantee this, you do need to know precisely what the input data looks like. By contrast, the ForkJoinPool provides good performance irrespective of the input data and is thus a significantly more robust solution.

查看更多
登录 后发表回答