Approximate Number of CPU Cycles for Various Opera

2019-04-21 15:48发布

问题:

I am trying to find a reference for approximately how many CPU cycles various operations require.

I don't need exact numbers (as this is going to vary between CPUs) but I'd like something relatively credible that gives ballpark figures that I could cite in discussion with friends.

As an example, we all know that floating point division takes more CPU cycles than say doing a bitshift.

I'd guess that the difference is that the division is around 100 cycles, where as a shift is 1 but I'm looking for something to cite to back that up.

Can anyone recommend such a resource?

回答1:

I did a small app to test this. A very approximate app using synthmaker free edition... e is for empty, numbers are very approx cycles

  divide|e:115|10
    mult|e: 48|10
     add|e: 48|10
    subs|e: 50|10
compare>|e: 50|10
     sin|e:135:10

The readings in the cycle analyser vary wildly from 50 to 100, usually single or double of the expected amount, these are figures that represent averages,the cycle analyzer is a very rough tool, but it gives fair results, a workaround user made exponent coded in ASM that calculates both the exp and the base at audio rate for example is around 800 cycles, so I'd say the above figures are close to at least 50 percent. I thought the divide was way more! It seems about twice as much. If you want the file I made to run in SM free version mail me, I was going to save an exe that is why i did it but you cant save in free version silly me! I am not going to code it from square one in version 1.17 :/ ant.stewart at the place yahoo dotty com.



回答2:

For x86 processors, see Intel® 64 and IA-32 Architectures Optimization Reference Manual, probably Appendix C.

However, it's not in any way easy to figure out how many cycles an instruction takes to execute on a modern x86 processor, as it depends too much on e.g. accessing data in cache,aligned access, whether branch prediction fails, if there's a stall in the instruction pipeline and quite a lot of other things.



回答3:

This is going to be hardware-dependent. The best thing to do is to run some benchmarks on the particular hardware you want to test.

A benchmark would go roughly like this:

  • Run a primitive operation a million times (say, adding two integers)
  • Record the time it took to run (say, in seconds)
  • Multiply by the number of cycles your machine executes per second - this will give you the total number of cycles spent.
  • Divide 1000000 by the number from the previous step - this will give you the number of instructions per cycle. Keep in mind that with pipelining, this could be less than 1.