error: impossible register constraint in 'asm&

I've run into a problem while compiling a package around, I am not really a good coder, but I tried fixing it for my self, and it still won't compile. This is the original bit of code.:

#ifdef __GNUC__
asm("and $3, %%ecx;"
"shl $3 ,%%ecx;"
"ror %%cl, %0"
: "=r" (value)
: "r" (value), "c" (address));
#else

The error is .:

GBAinline.h:139: error: impossible register constraint in 'asm' ( ifdef line is 138 )

And this is how I tried to make it look.:

#ifdef __GNUC__
asm ("and $3 %%ecx,shl $3 %%ecx,ror %%cl, %0" : "=r" (value): "r" (value), "c" (address));

#else

Still, it would not work. It's a gba emulator before anyone ask, VBA, and this is part of GBAinline.h . This assembler is making me crazy already.

Edit.: The problem above was handled fine, I just was not paying attention to which compiler I was using. But now I get this error on this bit of code from a header file, I've put it on pastebin, to keep things here a bit more tidy... ( Sorry if this is wrong, i can change that later )

This is the header that has the lines that results in errors.: http://pastebin.com/k3D4cg0d

And this is the C file it refers to.: http://pastebin.com/Ymg1X5dg

This is giving an error like this.:

/var/tmp/cc3zA0lH.s: Assembler messages: /var/tmp/cc3zA0lH.s:69: Error: bad instruction `sw $3,0(r3)',

And so on for the rest of those lines.

That inline assembly is buggy:

It uses multi-line strings which effectively concatenate. Without \n all appears on one line. Whether your assembler accepts statements separated by semicolons makes all the difference there ... some may not.
It specifies the same variable as input/output constraint instead of using "+r"(value) as ordinarily suggested for this situation.

Without seeing the rest of the code it's not quite clear why the inline assembly statement looks the way it does; Personally, I'd suggest to write it like:

asm("ror %%cl, %0" : "+r"(value) : "c"((((uintptr_t)address) & 3) << 3)));

because there's little need to do the calculation itself in assembly. The uintptr_t (from <stdint.h>) cast makes this 32/64bit agnostic as well.

Edit:

If you want it for a different CPU but x86 / x64, then it obviously needs to be different ... For ARM (not Thumb2), it'd be:

asm("ROR %0, %0, %1" : "+r"(value) : "r"((((uintptr_t)address) & 3) << 3)));

since that's how the rotate instruction there behaves.

Edit (add reference):

Regarding the operation performed here as such, this blog post gives an interesting perspective - namely, that the compiler is quite likely to create the same output for:

(a >> shift | a << (8 * sizeof(a) - shift))

as for the x86 inline

asm("ror %%cl, %0" : "+r"(a) : "c"(shift))

Testing this:

#include <stdint.h>

int main(int argc, char **argv)
{
    unsigned int shift = (int)((((uintptr_t)argv) & 3) << 3);
    unsigned int a = argc;
#ifdef USE_ASM
    /*
     * Mark the assembly version with a "nop" instruction in output
     */
    asm("nop\n\t"
        "ror        %%cl, %0" : "+r"(a) : "c"(shift));
    return a;
#else
    return (a >> shift | a << (8 * sizeof(a) - shift));
#endif
}

Compile / disassemble it:

$ gcc -DUSE_ASM -O8 -c tf.c; objdump -d tf.o

tf.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :
   0:   83 e6 03                and    $0x3,%esi
   3:   8d 0c f5 00 00 00 00    lea    0x0(,%rsi,8),%ecx
   a:   90                      nop
   b:   d3 cf                   ror    %cl,%edi
   d:   89 f8                   mov    %edi,%eax
   f:   c3                      retq
$ gcc -O8 -c tf.c; objdump -d tf.o

tf.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :
   0:   83 e6 03                and    $0x3,%esi
   3:   8d 0c f5 00 00 00 00    lea    0x0(,%rsi,8),%ecx
   a:   d3 cf                   ror    %cl,%edi
   c:   89 f8                   mov    %edi,%eax
   e:   c3                      retq

Ergo, this inline assembly is unnecessary.