Is it better to use memcpy
as shown below or is it better to use std::copy()
in terms to performance? Why?
char *bits = NULL;
...
bits = new (std::nothrow) char[((int *) copyMe->bits)[0]];
if (bits == NULL)
{
cout << "ERROR Not enough memory.\n";
exit(1);
}
memcpy (bits, copyMe->bits, ((int *) copyMe->bits)[0]);
Just a minor addition: The speed difference between
memcpy()
andstd::copy()
can vary quite a bit depending on if optimizations are enabled or disabled. With g++ 6.2.0 and without optimizationsmemcpy()
clearly wins:When optimizations are enabled (
-O3
), everything looks pretty much the same again:The bigger the array the less noticeable the effect gets, but even at
N=1000
memcpy()
is about twice as fast when optimizations aren't enabled.Source code (requires Google Benchmark):
Profiling shows that statement:
std::copy()
is always as fast asmemcpy()
or faster is false.My system:
The code (language: c++):
Red Alert pointed out that the code uses memcpy from array to array and std::copy from array to vector. That coud be a reason for faster memcpy.
Since there is
v.reserve(sizeof(arr1));
there shall be no difference in copy to vector or array.
The code is fixed to use array for both cases. memcpy still faster:
All compilers I know will replace a simple
std::copy
with amemcpy
when it is appropriate, or even better, vectorize the copy so that it would be even faster than amemcpy
.In any case: profile and find out yourself. Different compilers will do different things, and it's quite possible it won't do exactly what you ask.
See this presentation on compiler optimisations (pdf).
Here's what GCC does for a simple
std::copy
of a POD type.Here's the disassembly (with only
-O
optimisation), showing the call tomemmove
:If you change the function signature to
then the
memmove
becomes amemcpy
for a slight performance improvement. Note thatmemcpy
itself will be heavily vectorised.In theory,
memcpy
might have a slight, imperceptible, infinitesimal, performance advantage, only because it doesn't have the same requirements asstd::copy
. From the man page ofmemcpy
:In other words,
memcpy
can ignore the possibility of overlapping data. (Passing overlapping arrays tomemcpy
is undefined behavior.) Somemcpy
doesn't need to explicitly check for this condition, whereasstd::copy
can be used as long as theOutputIterator
parameter is not in the source range. Note this is not the same as saying that the source range and destination range can't overlap.So since
std::copy
has somewhat different requirements, in theory it should be slightly (with an extreme emphasis on slightly) slower, since it probably will check for overlapping C-arrays, or else delegate the copying of C-arrays tomemmove
, which needs to perform the check. But in practice, you (and most profilers) probably won't even detect any difference.Of course, if you're not working with PODs, you can't use
memcpy
anyway.I'm going to go against the general wisdom here that
std::copy
will have a slight, almost imperceptible performance loss. I just did a test and found that to be untrue: I did notice a performance difference. However, the winner wasstd::copy
.I wrote a C++ SHA-2 implementation. In my test, I hash 5 strings using all four SHA-2 versions (224, 256, 384, 512), and I loop 300 times. I measure times using Boost.timer. That 300 loop counter is enough to completely stabilize my results. I ran the test 5 times each, alternating between the
memcpy
version and thestd::copy
version. My code takes advantage of grabbing data in as large of chunks as possible (many other implementations operate withchar
/char *
, whereas I operate withT
/T *
(whereT
is the largest type in the user's implementation that has correct overflow behavior), so fast memory access on the largest types I can is central to the performance of my algorithm. These are my results:Time (in seconds) to complete run of SHA-2 tests
Total average increase in speed of std::copy over memcpy: 2.99%
My compiler is gcc 4.6.3 on Fedora 16 x86_64. My optimization flags are
-Ofast -march=native -funsafe-loop-optimizations
.Code for my SHA-2 implementations.
I decided to run a test on my MD5 implementation as well. The results were much less stable, so I decided to do 10 runs. However, after my first few attempts, I got results that varied wildly from one run to the next, so I'm guessing there was some sort of OS activity going on. I decided to start over.
Same compiler settings and flags. There is only one version of MD5, and it's faster than SHA-2, so I did 3000 loops on a similar set of 5 test strings.
These are my final 10 results:
Time (in seconds) to complete run of MD5 tests
Total average decrease in speed of std::copy over memcpy: 0.11%
Code for my MD5 implementation
These results suggest that there is some optimization that std::copy used in my SHA-2 tests that
std::copy
could not use in my MD5 tests. In the SHA-2 tests, both arrays were created in the same function that calledstd::copy
/memcpy
. In my MD5 tests, one of the arrays was passed in to the function as a function parameter.I did a little bit more testing to see what I could do to make
std::copy
faster again. The answer turned out to be simple: turn on link time optimization. These are my results with LTO turned on (option -flto in gcc):Time (in seconds) to complete run of MD5 tests with -flto
Total average increase in speed of std::copy over memcpy: 0.72%
In summary, there does not appear to be a performance penalty for using
std::copy
. In fact, there appears to be a performance gain.Explanation of results
So why might
std::copy
give a performance boost?First, I would not expect it to be slower for any implementation, as long as the optimization of inlining is turned on. All compilers inline aggressively; it is possibly the most important optimization because it enables so many other optimizations.
std::copy
can (and I suspect all real world implementations do) detect that the arguments are trivially copyable and that memory is laid out sequentially. This means that in the worst case, whenmemcpy
is legal,std::copy
should perform no worse. The trivial implementation ofstd::copy
that defers tomemcpy
should meet your compiler's criteria of "always inline this when optimizing for speed or size".However,
std::copy
also keeps more of its information. When you callstd::copy
, the function keeps the types intact.memcpy
operates onvoid *
, which discards almost all useful information. For instance, if I pass in an array ofstd::uint64_t
, the compiler or library implementer may be able to take advantage of 64-bit alignment withstd::copy
, but it may be more difficult to do so withmemcpy
. Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range, then the aligned portion, then the unaligned portion at the end. If it is all guaranteed to be aligned, then the code becomes simpler and faster, and easier for the branch predictor in your processor to get correct.Premature optimization?
std::copy
is in an interesting position. I expect it to never be slower thanmemcpy
and sometimes faster with any modern optimizing compiler. Moreover, anything that you canmemcpy
, you canstd::copy
.memcpy
does not allow any overlap in the buffers, whereasstd::copy
supports overlap in one direction (withstd::copy_backward
for the other direction of overlap).memcpy
only works on pointers,std::copy
works on any iterators (std::map
,std::vector
,std::deque
, or my own custom type). In other words, you should just usestd::copy
when you need to copy chunks of data around.Always use
std::copy
becausememcpy
is limited to only C-style POD structures, and the compiler will likely replace calls tostd::copy
withmemcpy
if the targets are in fact POD.Plus,
std::copy
can be used with many iterator types, not just pointers.std::copy
is more flexible for no performance loss and is the clear winner.