Performance cost of passing by value vs. by refere

2019-02-11 16:09发布

站内文章 / C++

48 0

做个烂人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Let's consider an object foo (which may be an int, a double, a custom struct, a class, whatever). My understanding is that passing foo by reference to a function (or just passing a pointer to foo) leads to higher performance since we avoid making a local copy (which could be expensive if foo is large).

However, from the answer here it seems that pointers on a 64-bit system can be expected in practice to have a size of 8 bytes, regardless of what's being pointed. On my system, a float is 4 bytes. Does that mean that if foo is of type float, then it is more efficient to just pass foo by value rather than give a pointer to it (assuming no other constraints that would make using one more efficient than the other inside the function)?

回答1:

It depends on what you mean by "cost", and properties of the host system (hardware, operating system) with respect to operations.

If your cost measure is memory usage, then the calculation of cost is obvious - add up the sizes of whatever is being copied.

If your measure is execution speed (or "efficiency") then the game is different. Hardware (and operating systems and compiler) tend to be optimised for performance of operations on copying things of particular sizes, by virtue of dedicated circuits (machine registers, and how they are used).

It is common, for example, for a machine to have an architecture (machine registers, memory architecture, etc) which result in a "sweet spot" - copying variables of some size is most "efficient", but copying larger OR SMALLER variables is less so. Larger variables will cost more to copy, because there may be a need to do multiple copies of smaller chunks. Smaller ones may also cost more, because the compiler needs to copy the smaller value into a larger variable (or register), do the operations on it, then copy the value back.

Examples with floating point include some cray supercomputers, which natively support double precision floating point (aka double in C++), and all operations on single precision (aka float in C++) are emulated in software. Some older 32-bit x86 CPUs also worked internally with 32-bit integers, and operations on 16-bit integers required more clock cycles due to translation to/from 32-bit (this is not true with more modern 32-bit or 64-bit x86 processors, as they allow copying 16-bit integers to/from 32-bit registers, and operating on them, with fewer such penalties).

It is a bit of a no-brainer that copying a very large structure by value will be less efficient than creating and copying its address. But, because of factors like the above, the cross-over point between "best to copy something of that size by value" and "best to pass its address" is less clear.

Pointers and references tend to be implemented in a similar manner (e.g. pass by reference can be implemented in the same way as passing a pointer) but that is not guaranteed.

The only way to be sure is to measure it. And realise that the measurements will vary between systems.

回答2:

There is one thing nobody mentioned.

There is a certain GCC optimization called IPA SRA, that replaces "pass by reference" with "pass by value" automatically: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (-fipa-sra)

This is most likely done for scalar types (eg. int, double, etc), that does not have non-default copy semantics and can fit into cpu registers.

This makes

void(const int &f)

probably as fast (and space optimized)

void(int f)

So with this optimization enabled, using references for small types should be as fast as passing them by value.

On the other hand passing (for example) std::string by value could not be optimized to by-reference speed, as custom copy semantics are being involved.

From what I understand, using pass by reference for everything should never be slower than manually picking what to pass by value and what to pass by reference.

This is extremely useful especially for templates:

template<class T>
void f(const T&)
{
    // Something
}

is always optimal

回答3:

Does that mean that if foo is of type float, then it is more efficient to just pass foo by value?

Passing a float by value could be more efficient. I would expect it to be more efficient - partly because of what you said: A float is smaller than a pointer on a system that you describe. But in addition, when you copy the pointer, you still need to dereference the pointer to get the value within the function. The indirection added by the pointer could have a significant effect on the performance.

The efficiency difference could be negligible. In particular, if the function can be inlined and optimization is enabled, there is likely not going to be any difference.

You can find out if there is any performance gain from passing the float by value in your case by measuring. You can measure the efficiency with a profiling tool.

You may substitute pointer with reference and the answer will still apply equally well.

Is there some sort of overhead in using a reference, the way that there is when a pointer must be dereferenced?

Yes. It is likely that a reference has exactly the same performance characteristics as a pointer does. If it is possible to write a semantically equivalent program using either references or pointers, both are probably going to generate identical assembly.

If passing a small object by pointer would be faster than copying it, then surely it would be true for an object of same size, wouldn't you agree? How about a pointer to a pointer, that's about the size of a pointer, right? (It's exactly the same size.) Oh, but pointers are objects too. So, if passing an object (such as a pointer) by pointer is faster than copying the object (the pointer), then passing a pointer to a pointer to a pointer to a pointer ... to a pointer would be faster than the progarm with less pointers that's still faster than the one that didn't use pointers... Perhap's we've found an infinite source of efficiency here :)

回答4:

You must test any given scenario where performance is absolutely critical, but be very careful about trying to force the compiler to generate code in a specific way.

The compiler's optimizer is allowed to re-write your code in any way it chooses as long as the final result is the provably same, which can lead to some very nice optimizations.

Consider that passing a float by value requires making a copy of the float, but under the right conditions, passing a float by reference could allow storing the original float in a CPU floating-point register, and treat that register as the "reference" parameter to the function. By contrast, if you pass a copy, the compiler has to find a place to store the copy in order to preserve the contents of the register, or even worse, it may not be able to use a register at all because of the need for preserving the original (this is especially true in recursive functions!).

This difference is also important if you are passing the reference to a function that could be inlined, where the reference may reduce the cost of inlining since the compiler doesn't have to guarantee that a copied parameter cannot modify the original.

The more a language allows you to focus on describing what you want done rather than how you want it done, the more the compiler is able to find creative ways of doing the hard work for you. In C++ especially, it is generally best not to worry about performance, and instead focus on describing what you want as clearly and simply as possible. By trying to describe how you want the work done, you will just as often prevent the compiler from doing its job of optimizing your code for you.