making mistake in inline assembler in gcc [duplica

2019-02-17 18:34发布

This question already has an answer here:

I have successfully written some inline assembler in gcc to rotate right one bit following some nice instructions: http://www.cs.dartmouth.edu/~sergey/cs108/2009/gcc-inline-asm.pdf

Here's an example:

static inline int ror(int v) {
    asm ("ror %0;" :"=r"(v) /* output */ :"0"(v) /* input */ );
    return v;
}

However, I want code to count clock cycles, and have seen some in the wrong (probably microsoft) format. I don't know how to do these things in gcc. Any help?

unsigned __int64 inline GetRDTSC() {
   __asm {
      ; Flush the pipeline
      XOR eax, eax
      CPUID
      ; Get RDTSC counter in edx:eax
      RDTSC
   }
}

I tried:

static inline unsigned long long getClocks() {
    asm("xor %%eax, %%eax" );
    asm(CPUID);
    asm(RDTSC : : %%edx %%eax); //Get RDTSC counter in edx:eax

but I don't know how to get the edx:eax pair to return as 64 bits cleanly, and don't know how to really flush the pipeline.

Also, the best source code I found was at: http://www.strchr.com/performance_measurements_with_rdtsc

and that was mentioning pentium, so if there are different ways of doing it on different intel/AMD variants, please let me know. I would prefer something that works on all x86 platforms, even if it's a bit ugly, to a range of solutions for each variant, but I wouldn't mind knowing about it.

2条回答
【Aperson】
2楼-- · 2019-02-17 18:59

This will store the result in value. Combining the results takes extra cycles, so the number of cycles between calls to this code will be a few less than the difference in results.

unsigned int hi,lo;
unsigned long long value;
asm (
    "cpuid\n\t"
    "rdtsc"
    : "d" (hi), "a" (lo)
);
value = (((unsigned long long)hi) << 32) | lo;
查看更多
混吃等死
3楼-- · 2019-02-17 19:14

The following does what you want:

inline unsigned long long rdtsc() {
  unsigned int lo, hi;
  asm volatile (
     "cpuid \n"
     "rdtsc" 
   : "=a"(lo), "=d"(hi) /* outputs */
   : "a"(0)             /* inputs */
   : "%ebx", "%ecx");     /* clobbers*/
  return ((unsigned long long)lo) | (((unsigned long long)hi) << 32);
}

It is important to put as little inline ASM as possible in your code, because it prevents the compiler from doing any optimizations. That's why I've done the shift and oring of the result in C code rather than coding that in ASM as well. Similarly, I use the "a" input of 0 to let the compiler decide when and how to zero out eax. It could be that some other code in your program already zeroed it out, and the compiler could save an instruction if it knows that.

Also, the "clobbers" above are very important. CPUID overwrites everything in eax, ebx, ecx, and edx. You need to tell the compiler that you're changing these registers so that it knows not to keep anything important there. You don't have to list eax and edx because you're using them as outputs. If you don't list the clobbers, there's a serious chance your program will crash and you will find it extremely difficult to track down the issue.

查看更多
登录 后发表回答