Which inline assembly code is correct for rdtscp?

2019-01-11 23:33发布

问题:

Disclaimer: Words cannot describe how much I detest AT&T style syntax

I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

The first version I used was

static unsigned long long rdtscp(void)
{
    unsigned int hi, lo;
    __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
    return (unsigned long long)lo | ((unsigned long long)hi << 32);
}

I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

The next version I found is

static unsigned long long rdtscp(void)
{
    unsigned long long tsc;
    __asm__ __volatile__(
        "rdtscp;"
        "shl $32, %%rdx;"
        "or %%rdx, %%rax"
        : "=a"(tsc)
        :
        : "%rcx", "%rdx");

    return tsc;
}

This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

What's correct... version 1, or version 2, or both?

回答1:

Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter

static inline uint64_t rdtscp( uint32_t & aux )
{
    uint64_t rax,rdx;
    asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return (rdx << 32) + rax;
}

It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.



回答2:

According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.