C++: Structs slower to access than basic variables

2020-07-02 11:48发布

I found some code that had "optimization" like this:

void somefunc(SomeStruct param){
    float x = param.x; // param.x and x are both floats. supposedly this makes it faster access
    float y = param.y;
    float z = param.z;
}

And the comments said that it will make the variable access faster, but i've always thought structs element access is as fast as if it wasnt struct after all.

Could someone clear my head off this?

标签: c++ struct
9条回答
forever°为你锁心
2楼-- · 2020-07-02 12:10

The usual rules for optimization (Michael A. Jackson) apply: 1. Don't do it. 2. (For experts only:) Don't do it yet.

That being said, let's assume it's the innermost loop that takes 80% of the time of a performance-critical application. Even then, I doubt you will ever see any difference. Let's use this piece of code for instance:

struct Xyz {
    float x, y, z;
};

float f(Xyz param){
    return param.x + param.y + param.z;
}

float g(Xyz param){
    float x = param.x;
    float y = param.y;
    float z = param.z;
    return x + y + z;
}

Running it through LLVM shows: Only with no optimizations, the two act as expected (g copies the struct members into locals, then proceeds sums those; f sums the values fetched from param directly). With standard optimization levels, both result in identical code (extracting the values once, then summing them).

For short code, this "optimization" is actually harmful, as it copies the floats needlessly. For longer code using the members in several places, it might help a teensy bit if you actively tell your compiler to be stupid. A quick test with 65 (instead of 2) additions of the members/locals confirms this: With no optimizations, f repeatedly loads the struct members while g reuses the already extracted locals. The optimized versions are again identical and both extract the members only once. (Surprisingly, there's no strength reduction turning the additions into multiplications even with LTO enabled, but that just indicates the LLVM version used isn't optimizing too agressively anyway - so it should work just as well in other compilers.)

So, the bottom line is: Unless you know your code will have to be compiled by a compiler that's so outragously stupid and/or ancient that it won't optimize anything, you now have proof that the compiler will make both ways equivalent and can thus do away with this crime against readability and brewity commited in the name of performance. (Repeat the experiment for your particular compiler if necessary.)

查看更多
Evening l夕情丶
3楼-- · 2020-07-02 12:13

When you specify a "simple" variable (not a struct/class) to be operated upon, the system only has to go to that place and fetch the data it wants.

But when you refer to a variable inside a struct or class, like A.B, the system needs to calculate where B is inside that area called A (because there may be other variables declared before it), and that calculation takes a bit more than the the more plain access described above.

查看更多
Evening l夕情丶
4楼-- · 2020-07-02 12:14

Rule of thumb: it's not slow, unless profiler says it is. Let the compiler worry about micro-optimisations (they're pretty smart about them; after all, they've been doing it for years) and focus on the bigger picture.

查看更多
相关推荐>>
5楼-- · 2020-07-02 12:18

In an unoptimised code:

  • function parameters (which are not passed by reference) are on the stack
  • local variables are also on the stack

Unoptimised access to local variables and function parameters in an assembly language look more-or-less like this:

mov %eax, %ebp+ compile-time-constant

where %ebp is a frame pointer (sort of 'this' pointer for a function).

It makes no difference if you access a parameter or a local variable.

The fact that you are accessing an element from a struct makes absolutely no difference from the assembly/machine point of view. Structs are constructs made in C to make programmer's life easier.

So, ulitmately, my answer is: No, there is absolutely no benefit in doing that.

查看更多
成全新的幸福
6楼-- · 2020-07-02 12:19

There are good and valid reasons to do that kind of optimization when pointers are used, because consuming all inputs first frees the compiler from possible aliasing issues which prevent it from producing optimal code (there's restrict nowadays too, though).

For non-pointer types, there is in theory an overhead because every member is accessed via the struct's this pointer. This may in theory be noticeable within an inner loop and will in theory be a diminuitive overhead otherwise.
In practice, however, a modern compiler will almost always (unless there is a complex inheritance hierarchy) produce the exact same binary code.

I had asked myself the exact same question as you did about two years ago and did a very extensive test case using gcc 4.4. My findings were that unless you really try to throw sticks between the compiler's legs on purpose, there is absolutely no difference in the generated code.

查看更多
Juvenile、少年°
7楼-- · 2020-07-02 12:23

I'm no compiler guru, so take this with a grain of salt. I'm guessing that the original author of the code is assuming that by copying the values from the struct into local variables, the compiler has "placed" those variables into floating point registers which are available on some platforms (e.g., x86). If there aren't enough registers to go around, they'd be put in the stack.

That being said, unless this code was in the middle of an intensive computation/loop, I'd strive for clarity rather than speed. It's pretty rare that anyone is going to notice a few instructions difference in timing.

查看更多
登录 后发表回答