clang / gcc : Some inline assembly operands can be satisfied with multiple constraints, e.g., "rm"
, when an operand can be satisfied with a register or memory location. As an example, the 64 x 64 = 128 bit multiply:
__asm__ ("mulq %q3" : "=a" (rl), "=d" (rh) : "%0" (x), "rm" (y) : "cc")
The generated code appears to choose a memory constraint for argument 3
, which would be fine if we were register starved, to avoid a spill. Obviously there's less register pressure on x86-64 than on IA32. However, the assembly snippet generated (by clang) is:
movq %rcx, -8(%rbp)
## InlineAsm Start
mulq -8(%rbp)
## InlineAsm End
Choosing a memory constraint is clearly pointless! Changing the constraint to: "r" (y)
, however (forcing a register) we get:
## InlineAsm Start
mulq %rcx
## InlineAsm End
as expected. These results are for clang / LLVM 3.2 (current Xcode release). The first question: Why would clang select the less efficient constraint in this case?
Secondly, there is the less widely used, comma-separated, multiple alternative constraint syntax:
"r,m" (y)
, which should evaluate the costs of each alternative, and choose the one that results in less copying. This appears to work, but clang simply chooses the first - as evidenced by: "m,r" (y)
I could simply drop the "m"
alternative constraints, but this doesn't express the range of possible legal operands. This brings me to the second question: Have these issues been resolved or at least acknowledged in 3.3? I've tried looking through LLVM dev archives, but I'd rather solicit some answers before unnecessarily restricting constraints further, or joining project discussions, etc.