I'm trying to use cmpxchg with inline assembly through c. This is my code:
static inline int
cas(volatile void* addr, int expected, int newval) {
int ret;
asm volatile("movl %2 , %%eax\n\t"
"lock; cmpxchg %0, %3\n\t"
"pushfl\n\t"
"popl %1\n\t"
"and $0x0040, %1\n\t"
: "+m" (*(int*)addr), "=r" (ret)
: "r" (expected), "r" (newval)
: "%eax"
);
return ret;
}
This is my first time using inline and i'm not sure what could be causing this problem. I tried "cmpxchgl" as well, but still nothing. Also tried removing the lock. I get "operand size mismatch". I think maybe it has something to do with the casting i do to addr, but i'm unsure. I try and exchange int for int, so don't really understand why there would be a size mismatch. This is using AT&T style. Thanks
You had the operand order for the cmpxchg instruction is reversed. AT&T syntax needs the memory destination last:
Or you could compile that instruction with its original order using
-masm=intel
, but the rest of your code is AT&T syntax and ordering so that's not the right answer.As far as why it says "operand size mismatch", I can only say that that appears to be an assembler bug, in that it uses the wrong message.
As @prl points out, you reversed the operands, putting them in Intel order (See Intel's manual entry for
cmpxchg
). Any time your inline asm doesn't assemble, you should look at the asm the compiler was feeding to the assembler to see what happened to your template. In your case, simply remove thestatic inline
so the compiler will make a stand-alone definition, then you get (on the Godbolt compiler explorer):Sometimes that will clue your eye / brain in cases where staring at
%3
and%0
didn't, especially after you check the instruction-set reference manual entry forcmpxchg
and see that the memory operand is the destination (Intel-syntax first operand, AT&T syntax last operand).This makes sense because the explicit register operand is only ever a source, while EAX and the memory operand are both read and then one or the other is written depending on the success of the compare. (And semantically you use
cmpxchg
as a conditional store to a memory destination.)You're discarding the load result from the cas-failure case. I can't think of any use-cases for
cmpxchg
where doing a separate load of the atomic value would be incorrect, rather than just inefficient, but the usual semantics for a CAS function is thatoldval
is taken by reference and updated on failure. (At least that's how C++11 std::atomic and C11 stdatomic do it withbool atomic_compare_exchange_weak( volatile A *obj, C* expected, C desired );
.)(The weak/strong thing allows better code-gen for CAS retry-loops on targets that use LL/SC, where spurious failure is possible due to an interrupt or being rewritten with the same value. x86's
lock cmpxchg
is "strong")Actually, GCC's legacy
__sync
builtins provide 2 separate CAS functions: one that returns the old value, and one that returns abool
. Both take the old/new value by reference. So it's not the same API that C++11 uses, but apparently it isn't so horrible that nobody used it.Your overcomplicated code isn't portable to x86-64. From your use of
popl
, I assume you developed it on x86-32. You don't needpushf/pop
to get ZF as an integer; that's whatsetcc
is for. cmpxchg example for 64 bit integer has a 32-bit example that works that way (to show what they want a 64-bit version of).Or even better, use GCC6 flag-return syntax so using this in a loop can compile to a
cmpxchg / jne
loop instead ofcmpxchg
/setz %al
/test %al,%al
/jnz
.We can fix all of those problems and improve the register allocation as well. (If the first or last instruction of an inline-asm statement is
mov
, you're probably using constraints inefficiently.)Of course, by far the best thing for real usage would be to use C11 stdatomic or a GCC builtin. https://gcc.gnu.org/wiki/DontUseInlineAsm in cases where the compiler can emit just as good (or better) asm from code it "understands", because inline asm constrains the compiler. It's also difficult to write correctly / efficient, and to maintain.
Portable to i386 and x86-64, AT&T or Intel syntax, and works for any integer type width of register width or smaller:
The
{ foo,bar | bar,foo }
is ASM dialect alternatives. For x86, it's{AT&T | Intel}
. The%[newval]
is a named operand constraint; it's another way to keep your operands . The"=ccz"
takes thez
condition code as the output value, like asetz
.Compiles on Godbolt to this asm for 32-bit x86 with AT&T output:
gcc is dumb and stores a
0
in one reg before copying it toeax
, instead of re-zeroingeax
inside the loop. This is why it needs to save/restore EBX at all. It's the same asm we get from avoiding inline-asm, though (from x86 spinlock using cmpxchg):Someone should teach gcc that Intel CPUs can materialize a
0
more cheaply with xor-zeroing than they can copy it with mov, especially on Sandybridge (xor
-zeroing elimination but nomov
-elimination).