What is the fastest way to swap two non-overlapping memory areas of equal size? Say, I need to swap (t_Some *a)
with (t_Some *b)
. Considering space-time trade-off, will increased temporary space improve the speed? For example, (char *tmp)
vs (int *tmp)
? I am looking for a portable solution.
Prototype:
void swap_elements_of_array(void* base, size_t size_of_element, int a, int b);
The fastest way to move a block of memory is going to be
memcpy()
from<string.h>
. If youmemcpy()
froma
totemp
,memmove()
fromb
toa
, thenmemcpy()
fromtemp
tob
, you’ll have a swap that uses the optimized library routines, which the compiler probably inlines. You wouldn’t want to copy the entire block at once, but in vector-sized chunks.In practice, if you write a tight loop, the compiler can probably tell that you’re swapping every element of the arrays and optimize accordingly. On most modern CPUs, you want to generate vector instructions. It might be able to generate faster code if you make sure all three buffers are aligned.
However, what you really want to do is make things easier for the optimizer. Take this program:
If you translate that into machine code as literally written, it’s a terrible algorithm, copying one byte at a time, doing two increments per iteration, and so on. In practice, though, the compiler sees what you’re really trying to do.
In clang 5.0.1 with
-std=c11 -O3
, it produces (in part) the following inner loop on x86_64:Whereas gcc 7.2.0 with the same flags also vectorizes, unrolling the loop less:
Convincing the compiler to produce instructions that work on a single word at a time, instead of vectorizing the loop, is the opposite of what you want!
Word writes will be the fastest. However, both block size and alignment need to be considered. In practice things are usually aligned sensibly, but you shouldn't count on it.
memcpy()
safely handles everything and may be specialized (built-in) for constant sizes within reason.Here is a portable solution that works reasonably well in most cases.