I have a friendly competition with couple of guys in the field of programming and recently we have become so interested in writing efficient code. Our challenge was to try to optimize the code (in sense of cpu time and complexity) at any cost (readability, reusability, etc).
The problem is, now we need to compare our codes and see which approach is better comparing to the others but we don't know any tools for this purpose.
My question is, are there some (any!) tools that takes a piece of code as input and calculates the number of flops or cpu instructions necessary for running it? Is there any tool can measure the optimacy of a code?
P.S. The target language is c++ but would be nice to know if such tools exists also for java.
From the inline-assembly, you can use rdtsc instruction to get 32-bit(least significant part) counter into eax and 32-bit(highest significant part) to edx. If your code is too small, you can check total-approimate cpu-cycles with just eax register. If count is more than max. of 32-bit value, edx increments per max-32-bit value cycle.
Output: 74000 cpu-cycles for 1000 iterations and 800000 cpu-cycles for 10000 iterations on my machine. Because clock() is time-consuming.
Cpu-cycle resolution on my machine: ~1000 cycles. Yes, you need more than several thousands of addition/subtraction(fast instructions) to measure it relatively correct.
Assuming cpu working frequency being constant, 1000 cpu-cycles is nearly equal to 1 micro-seconds for a 1GHz cpu. You should warm your cpu up before doing this.
It is quite difficult to calculate the detailing number of cpu time from a block of code. The normal way to do this is to design the worse / average / best input data as test cases. And do a timing profiling based on your real code with these test cases. There is no any tool can tell you the flops when it is without the detailing input test data and conditions.
Here's a little C++11 stopwatch I like to roll out when I need to time something:
Usage:
On any decent implementation, the default
high_resolution_clock
should give very accurate timing information.Best for your purposes is valgrind/callgrind
There is the
std::clock()
function from<ctime>
which returns how much CPU time was spent on the current process (that means it doesn't count the time the program was idling because the CPU was executing other tasks). This function can be used to accurately measure execution times of algorithms. Use the constantstd::CLOCKS_PER_SEC
(also from<ctime>
) to convert the return value into seconds.Measuring the number of CPU instructions is pretty useless.
Performance is relative to bottleneck, depending on the problem at hand the bottleneck might be the network, disk IOs, memory or CPU.
For just a friendly competition, I would suggest timing. Which implies providing test cases that are big enough to have meaningful measures, of course.
On Unix, you can use
gettimeofday
for relatively precise measures.