I want to benchmark a C/C++ code. I want to measure cpu time, wall time and cycles/byte. I wrote some mesurement functions but have a problem with cycles/byte.
To get a cpu time I wrote a function getrusage()
with RUSAGE_SELF
, for wall time i use clock_gettime
with MONOTONIC
, to get cycles/byte I use rdtsc
.
I process an input buffer of size, for example, 1024: char buffer[1024]
. How do I benchmark:
- Do a warm-up phase, simply call
fun2measure(args)
1000 times:
for(int i=0; i<1000; i++)
fun2measure(args);
Then, do a real-timing benchmark, for wall time:
`unsigned long i; double timeTaken; double timeTotal = 3.0; // process 3 seconds
for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++) fun2measure(args); `
And for cpu time (almost the same):
for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++) fun2measure(args);
But when I want to get a cpu cycle count for function, I use this piece of code:
`unsigned long s = cyclecount();
for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++)
{
fun2measure(args);
}
unsigned long e = cyclecount();
unsigned long s = cyclecount();
for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = cputime(1), i++)
{
fun2measure(args);
}
unsigned long e = cyclecount();`
and then, count cycles/byte: ((e - s) / (i * inputsSize);
. Here inputsSize
is 1024 because its the length of the buffer
. But when I rise totalTime
to 10s I ge strange results:
for 10s:
Did fun2measure 1148531 times in 10.00 seconds for 1024 bytes, 0 cycles/byte [CPU]
Did fun2measure 1000221 times in 10.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]
for 5s:
Did fun2measure 578476 times in 5.00 seconds for 1024 bytes, 0 cycles/byte [CPU]
Did fun2measure 499542 times in 5.00 seconds for 1024 bytes, 7.000000 cycles/byte [WALL]
for 4s:
Did fun2measure 456828 times in 4.00 seconds for 1024 bytes, 4 cycles/byte [CPU]
Did fun2measure 396612 times in 4.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]
My questions:
- Are those results ok?
- Why when I increase time I always get 0 cycles/byte in cpu?
- How can I measure average time, mean, standard deviation etc statistics for such benchmarking?
- Is my benchmarking method 100% ok?
CHEERS!
1st EDIT:
After changing i
to double
:
Did fun2measure 1138164.00 times in 10.00 seconds for 1024 bytes, 0.410739 cycles/byte [CPU]
Did fun2measure 999849.00 times in 10.00 seconds for 1024 bytes, 3.382036 cycles/byte [WALL]
my results seem to be ok. So question #2 isnt a question anymore:)