I want to swap two integers, and I want to know which of these two implementations will be faster: The obvious way with a temp variable:
void swap(int* a, int* b)
{
int temp = *a;
*a = *b;
*b = temp;
}
Or the xor version that I'm sure most people have seen:
void swap(int* a, int* b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
It seems like the first uses an extra register, but the second one is doing three loads and stores while the first only does two of each. Can someone tell me which is faster and why? The why being more important.
Another beautiful way.
Advantage
No need of function call and handy.
Drawback:
This fails when both inputs are same variable. It can be used only on integer variables.
In my opinion local optimizations like this should only be considered tightly related to the platform. It makes a huge difference if you are compiling this on a 16 bit uC compiler or on gcc with x64 as target.
If you have a specific target in mind then just try both of them and look at the generated asm code or profile your applciation with both methods and see which is actually faster on your platform.
For those to stumble upon this question and decide to use the XOR method. You should consider inlining your function or using a macro to avoid the overhead of a function call:
The first is faster because bitwise operations such as xor are usually very hard to visualize for the reader.
Faster to understand of course, which is the most important part ;)
All the top rated answers are not actually definitive "facts"... they are people who are speculating!
You can definitively know for a fact which code takes less assembly instructions to execute because you can look at the output assembly generated by the compiler and see which executes in less assembly instructions!
Here is the c code I compiled with flags "gcc -std=c99 -S -O3 lookingAtAsmOutput.c":
ASM output for swap_traditional() takes >>> 11 <<< instructions ( not including "leave", "ret", "size"):
ASM output for swap_xor() takes >>> 11 <<< instructions not including "leave" and "ret":
Summary of assembly output:
swap_traditional() takes 11 instructions
swap_xor() takes 11 instructions
Conclusion:
Both methods use the same amount of instructions to execute and therefore are approximately the same speed on this hardware platform.
Lesson learned:
When you have small code snippets, looking at the asm output is helpful to rapidly iterate your code and come up with the fastest ( i.e. least instructions ) code. And you can save time even because you don't have to run the program for each code change. You only need to run the code change at the end with a profiler to show that your code changes are faster.
I use this method a lot for heavy DSP code that needs speed.
Below piece of code will do the same. This snippet is optimized way of programming as it doesn't use any 3rd variable.