My system is i5-Dual core with hyper-threading. Windows show me 4 processors. When i run a single optimized cpu-bound task by a single thread at a time its service time always display arround 35ms. But when i handover 2 tasks to 2 threads simultanously their service times display arround 70ms. I want to ask that my system have 4 processors then why does service times are arround 70 in case of 2 threads running teir tasks whereas 2 threads should run on 2 processors without any scheduling overhead.The codes are as follows.
CPU-Bound Task is as follows.
import java.math.BigInteger;
public class CpuBoundJob implements Runnable {
public void run() {
BigInteger factValue = BigInteger.ONE;
long t1=System.nanoTime();
for ( int i = 2; i <= 2000; i++){
factValue = factValue.multiply(BigInteger.valueOf(i));
}
long t2=System.nanoTime();
System.out.println("Service Time(ms)="+((double)(t2-t1)/1000000));
}
}
Thread that runs a task is as follows.
public class TaskRunner extends Thread {
CpuBoundJob job=new CpuBoundJob();
public void run(){
job.run();
}
}
And Finally, main class is as follows.
public class Test2 {
int numberOfThreads=100;//warmup code for JIT
public Test2(){
for(int i=1;i<=numberOfThreads;i++){//warmup code for JIT
TaskRunner t=new TaskRunner();
t.start();
}
try{
Thread.sleep(5000);// wait a little bit
}catch(Exception e){}
System.out.println("Warmed up completed! now start benchmarking");
System.out.println("First run single thread at a time");
try{//wait for the thread to complete
Thread.sleep(5000);
}catch(Exception e){}
//run only one thread at a time
TaskRunner t1=new TaskRunner();
t1.start();
try{//wait for the thread to complete
Thread.sleep(5000);
}catch(Exception e){}
//Now run 2 threads simultanously at a time
System.out.println("Now run 3 thread at a time");
for(int i=1;i<=3;i++){//run 2 thread at a time
TaskRunner t2=new TaskRunner();
t2.start();
}
}
public static void main(String[] args) {
new Test2();
}
Final output:
Warmed up completed! now start benchmarking First run single thread at
a time Service Time(ms)=5.829112 Now run 2 thread at a time Service
Time(ms)=6.518721 Service Time(ms)=10.364269 Service
Time(ms)=10.272689
I timed this in a variety of scenarios, and with a slightly modified task, got times of ~45 ms with one thread and ~60 ms for two threads. So, even in this example, in one second, one thread can complete about 22 tasks, but two threads can complete 33 tasks.
However, if you run a task that doesn't tax the garbage collector so grievously, you should see the performance increase you expect: two threads complete twice as many tasks. Here is my version of your test program.
Note that I made one significant change to your task (DirtyTask
): n
was always 0, because you cast the result of Math.random()
to an int
(which is zero), and then multiplied by 13.
Then I added a CleanTask
that doesn't generate any new objects for the garbage collector to handle. Please test and report the results on your machine. On mine, I got this:
Testing "clean" task.
Average task time: one thread = 46 ms; two threads = 45 ms
Testing "dirty" task.
Average task time: one thread = 41 ms; two threads = 62 ms
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;
final class Parallels
{
private static final int RUNS = 10;
public static void main(String... argv)
throws Exception
{
System.out.println("Testing \"clean\" task.");
flavor(CleanTask::new);
System.out.println("Testing \"dirty\" task.");
flavor(DirtyTask::new);
}
private static void flavor(Supplier<Callable<Long>> tasks)
throws InterruptedException, ExecutionException
{
ExecutorService warmup = Executors.newFixedThreadPool(100);
for (int i = 0; i < 100; ++i)
warmup.submit(tasks.get());
warmup.shutdown();
warmup.awaitTermination(1, TimeUnit.DAYS);
ExecutorService workers = Executors.newFixedThreadPool(2);
long t1 = test(1, tasks, workers);
long t2 = test(2, tasks, workers);
System.out.printf("Average task time: one thread = %d ms; two threads = %d ms%n", t1 / (1 * RUNS), t2 / (2 * RUNS));
workers.shutdown();
}
private static long test(int n, Supplier<Callable<Long>> tasks, ExecutorService workers)
throws InterruptedException, ExecutionException
{
long sum = 0;
for (int i = 0; i < RUNS; ++i) {
List<Callable<Long>> batch = new ArrayList<>(n);
for (int t = 0; t < n; ++t)
batch.add(tasks.get());
List<Future<Long>> times = workers.invokeAll(batch);
for (Future<Long> f : times)
sum += f.get();
}
return TimeUnit.NANOSECONDS.toMillis(sum);
}
/**
* Do something on the CPU without creating any garbage, and return the
* elapsed time.
*/
private static class CleanTask
implements Callable<Long>
{
@Override
public Long call()
{
long time = System.nanoTime();
long x = 0;
for (int i = 0; i < 15_000_000; i++)
x ^= ThreadLocalRandom.current().nextLong();
if (x == 0)
throw new IllegalStateException();
return System.nanoTime() - time;
}
}
/**
* Do something on the CPU that creates a lot of garbage, and return the
* elapsed time.
*/
private static class DirtyTask
implements Callable<Long>
{
@Override
public Long call()
{
long time = System.nanoTime();
String s = "";
for (int i = 0; i < 10_000; i++)
s += (int) (ThreadLocalRandom.current().nextDouble() * 13);
if (s.length() == 10_000)
throw new IllegalStateException();
return System.nanoTime() - time;
}
}
}
for(int i=0;i<10000;i++)
{
int n=(int)Math.random()*13;
s+=name.valueOf(n);
//s+="*";
}
This code is a tight spin around a resource that can only be accessed by one thread at a time. So each thread just has to wait for the other to release the random number generator so that it can access it.
As the docs for Math.random
say:
When this method is first called, it creates a single new pseudorandom-number generator, exactly as if by the expression
new java.util.Random()
This new pseudorandom-number generator is used thereafter for all calls to this method and is used nowhere else.
This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.