Sequential:
void do(List<D> d, final List<C> c) {
for (D datum : d)
getChampoid(datum, c).tally(datum);
Parallel:
static final int procs = Runtime.getRuntime().availableProcessors();
static final ExecutorService pool = Executors.newFixedThreadPool(procs);
void do(List<D> d, final List<C> c) {
List<Future> futures = new ArrayList<>();
for (final D datum : d)
futures.add(pool.submit(new Runnable() {
@Override
public void run() {
getChampoid(datum, c).tally(datum);
}
}));
for (Future f : futures)
try {
f.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
I'm stumped because to me they look like they do the exact same thing, the parallel version should just be faster, but it's an order of magnitude slower. Any thoughts?
FYI both d and c are huge lists with somewhere between thousands and hundreds of thousands of items.
A) have you ruled out any synchronization? If
getChampoid
hits on a synchronized data structure (such as a Hashtable), i'm not surprised performance drops drastically.B) with Java, Object overhead is usually what kills performance. Make sure to use profilers such as VisualVM. In your case, I would not be surprised if you see a lot of time going into garbage collection.
Consider breaking your list
d
into a small number of parts, then processing each part in a thread. Right now, for everydatum
you will create aRunnable
and aFuture
object, and probably some more. Which, whend
is large, means a lot of work for the garbage collector (and here, it cannot be optimized away by Hotspot either).That is a common misconception.
A common result
Using multiple threads has overhead which one thread doesn't have. The overhead can be orders of magnitude higher than the work actually done, making it much slower. If the work done is much larger than the overhead you can get very good scalability.
e.g. say it costs about 10 micro-seconds to pass a task to another thread. If your task takes 1 micro-second, the overhead can kill your performance. However, if the task takes 100 micro-seconds, you could see a significant performance improvement and worth paying the price of the overhead.
In short, nothing is free esp not multiple thread.