可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I guess you all heard of the 'swap problem'; SO is full of questions about it. The version of the swap without use of a third variable is often considered to be faster since, well, you have one variable less. I wanted to know what was going on behind the curtains and wrote the following two programs:

int main () {
    int a = 9;
    int b = 5;
    int swap;

    swap = a;
    a = b;
    b = swap;

    return 0;
}

and the version without third variable:

int main () {
    int a = 9;
    int b = 5;

    a ^= b;
    b ^= a;
    a ^= b;

    return 0;
}

I generated the assembly code using clang and got this for the first version (that uses a third variable):

...
Ltmp0:
    movq   %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -8(%rbp), %ecx
    movl   %ecx, -16(%rbp)
    movl   -12(%rbp), %ecx
    movl   %ecx, -8(%rbp)
    movl   -16(%rbp), %ecx
    movl   %ecx, -12(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

and this for the second version (that does not use a third variable):

...
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    movl   -8(%rbp), %ecx
    movl   -12(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

The second one is longer but I don't know much about assembly code so I have no idea if that means that it is slower so I'd like to hear the opinion of someone more knowledgable about it.

Which of the above versions of a variable swap is faster and takes less memory?

回答1:

Look at some optimised assembly. From

void swap_temp(int *restrict a, int *restrict b){
    int temp = *a;
    *a = *b;
    *b = temp;
}

void swap_xor(int *restrict a, int *restrict b){
    *a ^= *b;
    *b ^= *a;
    *a ^= *b;
}

gcc -O3 -std=c99 -S -o swapping.s swapping.c produced

    .file   "swapping.c"
.text
.p2align 4,,15
.globl swap_temp
.type   swap_temp, @function
swap_temp:
.LFB0:
.cfi_startproc
movl    (%rdi), %eax
movl    (%rsi), %edx
movl    %edx, (%rdi)
movl    %eax, (%rsi)
ret
.cfi_endproc
.LFE0:
.size   swap_temp, .-swap_temp
.p2align 4,,15
.globl swap_xor
.type   swap_xor, @function
swap_xor:
.LFB1:
.cfi_startproc
movl    (%rsi), %edx
movl    (%rdi), %eax
xorl    %edx, %eax
xorl    %eax, %edx
xorl    %edx, %eax
movl    %edx, (%rsi)
movl    %eax, (%rdi)
ret
.cfi_endproc
.LFE1:
.size   swap_xor, .-swap_xor
.ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section    .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section    .note.GNU-stack,"",@progbits

To me, swap_temp looks as efficient as can be.

回答2:

The problem with XOR swap trick is that it's strictly sequential. It may seem deceptively fast, but in reality, it is not. There's an instruction called XCHG that swaps two registers, but this can also be slower than simply using 3 MOVs, due to its atomic nature. The common technique with temp is an excellent choice ;)

回答3:

To get an idea of the cost imagine that every command has a cost to be performed and also the indirect addressing has its own cost.

movl   -12(%rbp), %ecx

This line will need something like a time unit for accessing the value in ecx register, one time unit for accessing rbp, another one for applying the offset (-12) and more time units (let's say arbitrarily 3) for moving the value from the address stored in ecx to the address indicated from -12(%rbp).

If you count all the operations in every line and all line, the second method is for sure costlier than the first one.