How to benchmark in Qemu i386 system using rdtsc

2020-07-30 06:45发布

问题:

Currently I am trying to measure number of clock cycles taken to complete an operation by two different programming languages on same environment. (without using an OS)

Currently I am using Qemu-i386 emulator and using rdtsc to measure the clock cycles.

/* Return the number of CPU ticks since boot. */
static inline u64 rdtsc(void)
{
    u32 hi, lo;
    // asm("cpuid");
    asm("rdtsc" : "=a" (lo), "=d" (hi));
    return ((u64) lo) | (((u64) hi) << 32);
}

Taking the difference between rdtsc before and after operation should provide the number of clock cycles.

    start_time = rdtsc();
    operation();
    stop_time = rdtsc();
    num_cycles = stop_time-start_time;

But the difference is not constant even when I take over 100s of iterations and varies by few thousands of cycles.

  • Is there any better way of measuring clock cycles?

  • Also is there any way of providing frequency as an input parameter in Qemu? Currently I am using

qemu-system-i386 -kernel out.elf

回答1:

Trying to benchmark guest software under QEMU emulation is at best extremely difficult. QEMU's emulation does not have performance characteristics that are anything like a real hardware CPU's: some operations that are fast on hardware, like floating point, are very slow on QEMU; we don't model caches and you won't see anything like the performance curves you would see as data sets reach cache line or L1/L2/etc cache size limits; and so on.

Important factors in performance on a modern CPU include (at least):

  • raw instruction counts executed
  • TLB misses
  • branch predictor misses
  • cache misses

QEMU doesn't track any of the last three and only makes a vague attempt at the first one if you use the -icount option. (In particular, without -icount the RDTSC value we provide to the guest under emulation is more-or-less just the host CPU RDTSC value, so times measured with it will include all sorts of QEMU overhead including time spent translating guest code.)

Assuming you're on an x86 host, you could try the -enable-kvm option to run this under a KVM virtual machine. Then at least you'll be looking at the real performance of a hardware CPU, though you will still see some noise from the overhead as other host processes contend for CPU with the VM.