C++: Structs slower to access than basic variables

2020-07-02 11:48发布

I found some code that had "optimization" like this:

void somefunc(SomeStruct param){
    float x = param.x; // param.x and x are both floats. supposedly this makes it faster access
    float y = param.y;
    float z = param.z;
}

And the comments said that it will make the variable access faster, but i've always thought structs element access is as fast as if it wasnt struct after all.

Could someone clear my head off this?

标签: c++ struct
9条回答
别忘想泡老子
2楼-- · 2020-07-02 12:25

Compiler may make faster code to copy float-to-float. But when x will used it will be converted to internal FPU representation.

查看更多
叼着烟拽天下
3楼-- · 2020-07-02 12:28

You'd have to look at the compiled code on a particular implementation to be sure, but there's no reason in principle why your preferred code (using the struct members) should necessarily be any slower than the code you've shown (copying into variables and then using the variables).

someFunc takes a struct by value, so it has its own local copy of that struct. The compiler is perfectly at liberty to apply exactly the same optimizations to the struct members, as it would apply to the float variables. They're both automatic variables, and in both cases the "as-if" rule allows them to be stored in register(s) rather than in memory provided that the function produces the correct observable behavior.

This is unless of course you take a pointer to the struct and use it, in which case the values need to be written in memory somewhere, in the correct order, pointed to by the pointer. This starts to limit optimization, and other limits are introduced by the fact that if you pass around a pointer to an automatic variable, the compiler can no longer assume that the variable name is the only reference to that memory and hence the only way its contents can be modified. Having multiple references to the same object is called "aliasing", and does sometimes block optimizations that could be made if the object was somehow known not to be aliased.

Then again, if this is an issue, and the rest of the code in the function somehow does use a pointer to the struct, then of course you could be on dodgy ground copying the values into variables from the POV of correctness. So the claimed optimization is not quite so straightforward as it looks in that case.

Now, there may be particular compilers (or particular optimization levels) which fail to apply to structs all the optimizations that they're permitted to apply, but do apply equivalent optimizations to float variables. If so then the comment would be right, and that's why you have to check to be sure. For example, maybe compare the emitted code for this:

float somefunc(SomeStruct param){
    float x = param.x; // param.x and x are both floats. supposedly this makes it faster access
    float y = param.y;
    float z = param.z;
    for (int i = 0; i < 10; ++i) {
        x += (y +i) * z;
    }
    return x;
}

with this:

float somefunc(SomeStruct param){
    for (int i = 0; i < 10; ++i) {
        param.x += (param.y +i) * param.z;
    }
    return param.x;
}

There may also be optimization levels where the extra variables make the code worse. I'm not sure I put much trust in code comments that say "supposedly this makes it faster access", sounds like the author doesn't really have a clear idea why it matters. "Apparently it makes it faster access - I don't know why but the tests to confirm this and to demonstrate that it makes a noticeable difference in the context of our program, are in source control in the following location" is a lot more like it ;-)

查看更多
闹够了就滚
4楼-- · 2020-07-02 12:30

The real answer is given by Piotr. This one is just for fun.

I have tested it. This code:

float somefunc(SomeStruct param, float &sum){
    float x = param.x;
    float y = param.y;
    float z = param.z;
    float xyz = x * y * z;
    sum = x + y + z;
    return xyz;
}

And this code:

float somefunc(SomeStruct param, float &sum){
    float xyz = param.x * param.y * param.z;
    sum = param.x + param.y + param.z;
    return xyz;
}

Generate identical assembly code when compiled with g++ -O2. They do generate different code with optimization turned off, though. Here is the difference:

<   movl    -32(%rbp), %eax
<   movl    %eax, -4(%rbp)
<   movl    -28(%rbp), %eax
<   movl    %eax, -8(%rbp)
<   movl    -24(%rbp), %eax
<   movl    %eax, -12(%rbp)
<   movss   -4(%rbp), %xmm0
<   mulss   -8(%rbp), %xmm0
<   mulss   -12(%rbp), %xmm0
<   movss   %xmm0, -16(%rbp)
<   movss   -4(%rbp), %xmm0
<   addss   -8(%rbp), %xmm0
<   addss   -12(%rbp), %xmm0
---
>   movss   -32(%rbp), %xmm1
>   movss   -28(%rbp), %xmm0
>   mulss   %xmm1, %xmm0
>   movss   -24(%rbp), %xmm1
>   mulss   %xmm1, %xmm0
>   movss   %xmm0, -4(%rbp)
>   movss   -32(%rbp), %xmm1
>   movss   -28(%rbp), %xmm0
>   addss   %xmm1, %xmm0
>   movss   -24(%rbp), %xmm1
>   addss   %xmm1, %xmm0

The lines marked < correspond to the version with "optimization" variables. It seems to me that the "optimized" version is even slower than the one with no extra variables. This is to be expected, though, as x, y and z are allocated on the stack, exactly like the param. What's the point of allocating more stack variables to duplicate existing ones?

If the one who did that "optimization" knew the language better, he would probably have declared those variables as register, but even that leaves the "optimized" version slightly slower and longer, at least on G++/x86-64.

查看更多
登录 后发表回答