I am using rdtsc and cpuid instructions (using volatile inline assembly instructions) to measure the CPU cycles of a program. The rdtsc instruction gives realistic results for my programs on Linux (with speed optimization -o2 -fomit-frame-pointer) and Windows (using speed optimization options C compiler for MS Visual Studio 2008 (I think its VC 9.0)).
Recently, I implemented a new program, which uses a lot of table-lookups and stuff like this. However, the rdtsc measurements of this program with gcc optimization on Linux always results in wrong measurements (very small number of CPU cycles) than I expect. The rdtsc measurements of the same program while running on Windows (compiled with optimizations and compiler I mentioned above) are realistic and agree to out expectations.
My question is there any way gcc optimization move the volatile assembly instructions some where to produce the above mentioned behaviour?
My code for the timers is given below:
#define TIMER_VARS \
uint32 start_lo, start_hi; \
uint32 ticks_lo, ticks_hi
#define TIMER_START() \
__asm__ __volatile__ \
("rdtsc" \
: "=a" (start_lo), "=d" (start_hi) /* a = eax, d = edx*/ \
: /* no input parameters*/ \
: "%ebx", "%ecx", "memory")
#define TIMER_STOP() \
__asm__ __volatile__ \
("rdtsc" \
"\n subl %2, %%eax" \
"\n sbbl %3, %%edx" \
: "=&a" (ticks_lo), "=&d" (ticks_hi) \
: "g" (start_lo), "g" (start_hi) \
: "%ebx", "%ecx", "memory")
I would be very thankful if some body could suggest some ideas on this.
thanks,