Why does this parallel algorithm run more slowly t

2020-05-07 11:15发布

问题:

Sequential:

void do(List<D> d, final List<C> c) {
for (D datum : d)
    getChampoid(datum, c).tally(datum);

Parallel:

static final int procs = Runtime.getRuntime().availableProcessors();
static final ExecutorService pool = Executors.newFixedThreadPool(procs);
void do(List<D> d, final List<C> c) {
    List<Future> futures = new ArrayList<>();
    for (final D datum : d)
        futures.add(pool.submit(new Runnable() {

            @Override
            public void run() {
                getChampoid(datum, c).tally(datum);
            }

        }));
    for (Future f : futures)
        try {
            f.get();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }

I'm stumped because to me they look like they do the exact same thing, the parallel version should just be faster, but it's an order of magnitude slower. Any thoughts?

FYI both d and c are huge lists with somewhere between thousands and hundreds of thousands of items.

回答1:

the parallel version should just be faster,

That is a common misconception.

but it's an order of magnitude slower.

A common result

Any thoughts?

Using multiple threads has overhead which one thread doesn't have. The overhead can be orders of magnitude higher than the work actually done, making it much slower. If the work done is much larger than the overhead you can get very good scalability.

e.g. say it costs about 10 micro-seconds to pass a task to another thread. If your task takes 1 micro-second, the overhead can kill your performance. However, if the task takes 100 micro-seconds, you could see a significant performance improvement and worth paying the price of the overhead.

In short, nothing is free esp not multiple thread.



回答2:

A) have you ruled out any synchronization? If getChampoid hits on a synchronized data structure (such as a Hashtable), i'm not surprised performance drops drastically.

B) with Java, Object overhead is usually what kills performance. Make sure to use profilers such as VisualVM. In your case, I would not be surprised if you see a lot of time going into garbage collection.

Consider breaking your list d into a small number of parts, then processing each part in a thread. Right now, for every datum you will create a Runnable and a Future object, and probably some more. Which, when d is large, means a lot of work for the garbage collector (and here, it cannot be optimized away by Hotspot either).