This is quite an interesting question so let me set the scene. I work at The National Museum of Computing, and we have just managed to get a Cray Y-MP EL super computer from 1992 running, and we really want to see how fast it can go!
We decided the best way to do this was to write a simple C program that would calculate prime numbers and show how long it took to do so, then run the program on a fast modern desktop PC and compare the results.
We quickly came up with this code to count prime numbers:
#include <stdio.h>
#include <time.h>
void main() {
clock_t start, end;
double runTime;
start = clock();
int i, num = 1, primes = 0;
while (num <= 1000) {
i = 2;
while (i <= num) {
if(num % i == 0)
break;
i++;
}
if (i == num)
primes++;
system("clear");
printf("%d prime numbers calculated\n",primes);
num++;
}
end = clock();
runTime = (end - start) / (double) CLOCKS_PER_SEC;
printf("This machine calculated all %d prime numbers under 1000 in %g seconds\n", primes, runTime);
}
Which on our dual core laptop running Ubuntu (The Cray runs UNICOS), worked perfectly, getting 100% CPU usage and taking about 10 minutes or so. When I got home I decided to try it on my hex-core modern gaming PC, and this is where we get our first issues.
I first adapted the code to run on Windows since that is what the gaming PC was using, but was saddened to find that the process was only getting about 15% of the CPU's power. I figured that must be Windows being Windows, so I booted into a Live CD of Ubuntu thinking that Ubuntu would allow the process to run with its full potential as it had done earlier on my laptop.
However I only got 5% usage! So my question is, how can I adapt the program to run on my gaming machine in either Windows 7 or live Linux at 100% CPU utilisation? Another thing that would be great but not necessary is if the end product can be one .exe that could be easily distributed and ran on Windows machines.
Thanks a lot!
P.S. Of course this program didn't really work with the Crays 8 specialist processors, and that is a whole other issue... If you know anything about optimising code to work on 90's Cray super computers give us a shout too!
TLDR; The accepted answer is both inefficient and incompatible. Following algo works 100x faster.
The gcc compiler available on MAC can't run
omp
. I had to install llvm(brew install llvm )
. But I didn't see CPU idle was going down while running OMP version.Here is a screenshot while OMP version was running.
Alternatively, I used the basic POSIX thread, that can be run using any c compiler and saw almost entire CPU used up when
nos of thread
=no of cores
= 4 (MacBook Pro, 2.3 GHz Intel Core i5). Here is the programme -Notice how the entire CPU is used up -
P.S. - If you increase no of threads then actual CPU usage go down (Try making no of threads = 20 .) as the system uses more time in context switching than actual computing.
By the way, my machine is not as beefy as @mystical (Accepted answer). But my version with basic POSIX threading works way faster than OMP one. Here is the result -
P.S. Increase threadload to 2.5 million to see CPU usage , as it completes in less than a second.
For a quick improvement on one core, remove system calls to reduce context-switching. Remove these lines:
The first is particularly bad, as it will spawn a new process every iteration.
Simply try to Zip and Unzip a big file , nothing as a heavy I/o operations can use cpu.
If you want 100% CPU, you need to use more than 1 core. To do that, you need multiple threads.
Here's a parallel version using OpenMP:
I had to increase the limit to
1000000
to make it take more than 1 second on my machine.Output:
Here's your 100% CPU:
Your algorithm to generate prime numbers is very inefficient. Compare it to primegen that generates the 50847534 primes up to 1000000000 in just 8 seconds on a Pentium II-350.
To consume all CPUs easily you could solve an embarrassingly parallel problem e.g., compute Mandelbrot set or use genetic programming to paint Mona Lisa in multiple threads (processes).
Another approach is to take an existing benchmark program for the Cray supercomputer and port it to a modern PC.
Try to parallelize your program using, e.g., OpenMP. It is a very simple and effective framework for making up parallel programs.