I've been looking for the answer for how to use BSWAP for lower 32-bit sub-register of 64-bit register. For example, 0x0123456789abcdef
is inside RAX register, and I want to change it to 0x01234567efcdab89
with a single instruction (because of performance).
So I tried following inline function:
#define BSWAP(T) { \
__asm__ __volatile__ ( \
"bswap %k0" \
: "=q" (T) \
: "q" (T)); \
}
And the result was 0x00000000efcdab89
. I don't understand why the compiler acts like this. Does anybody know the efficient solution?
Ah, yes, I understand the problem now:
the x86-64 processors implicitly zero-extend the 32-bit registers to 64-bit when doing 32-bit operations (on %eax, %ebx, etc). This is to maintain compatibility with legacy code that expects 32-bit semantics for these registers, as I understand it.
So I'm afraid that there is no way to do ror
on just the lower 32 bits of a 64-bit register. You'll have to do use a series of several instructions...
Check the assembly output generated by gcc! Use the gcc -s
flag to compile the code and generate asm output.
IIRC, x86-64 uses 32-bit integers by default when not explicitly directed to do otherwise, so this may be (part of) the problem.