C pointers vs direct member access for structs

2019-01-24 08:15发布

问题:

Say I have a struct like the following ...

typedef struct {
  int WheelCount;
  double MaxSpeed;
} Vehicle;

... and I have a global variable of this type (I'm well aware of the pitfalls of globals, this is for an embedded system, which I didn't design, and for which they're an unfortunate but necessary evil.) Is it faster to access the members of the struct directly or through a pointer ? ie

double LocalSpeed = MyGlobal.MaxSpeed;

or

double LocalSpeed = pMyGlobal->MaxSpeed;

One of my tasks is to simplify and fix a recently inherited embedded system.

回答1:

In general, I'd say go with the first option:

double LocalSpeed = MyGlobal.MaxSpeed;

This has one less dereference (you're not finding the pointer, then dereferencing it to get to it's location). It's also simpler and easier to read and maintain, since you don't need to create the pointer variable in addition to the struct.

That being said, I don't think any performance difference you'd see would be noticable, even on an embedded system. Both will be very, very fast access times.



回答2:

The first one should be faster since it doesn't require pointer dereferencing. Then again thats true for x86 based systems, not sure for others.

on x86 the first one would translate to something like this

mov eax, [address of MyGlobal.MaxSpeed]

and the second one would be something like this

mov ebx, [address of pMyGlobal] 
mov eax, [ebx+sizeof(int)] 


回答3:

On your embedded platform, it's likely that the architecture is optimized in such a way that it's essentially a wash, and even if it wasn't you would only ever notice a performance impact if this was executed in a very tight loop.

There are probably much more obvious performance areas of your system.



回答4:

struct dataStruct
{
    double first;
    double second;
} data;

int main()
{
    dataStruct* pData = &data;

    data.first = 9.0;
    pData->second = 10.0;
}

This is the assembly output using VS2008 release mode:

    data.first = 9.0;
008D1000  fld         qword ptr [__real@4022000000000000 (8D20F0h)] 

    pData->second = 10.0;
008D1006  xor         eax,eax 
008D1008  fstp        qword ptr [data (8D3378h)] 
008D100E  fld         qword ptr [__real@4024000000000000 (8D20E8h)] 
008D1014  fstp        qword ptr [data+8 (8D3380h)] 


回答5:

disassemble, disassemble, disassemble...

Depending on the lines of code you are not showing us it is possible that if your pointer is somewhat static a good compiler will know that and pre-compute the address for both. If you dont have optimizations on then this whole discussion is mute. It also depends on the processor you are using, both can be performed with a single instruction depending on the processor. So I follow the basic optimization steps:

1) disassemble and examine 2) time the execution

As mentioned above though the bottom line is it may be a case of two instructions instead of one costing a single clock cycle you would likely never see. The quality of your compiler and optimizer choices are going to make much more dramatic performance differences than trying to tweak one line of code in hopes of improving performance. Switching compilers can give you 10-20% in either direction, sometimes more. As can changing your optimization flags, turning everything on doesnt make the fastest code, sometimes -O1 performs better than -O3.

Understanding what those two lines of code produce and how to maximize performance from the high level language comes from compiling for different processors and disassembling using various compilers. And more importantly the code around the lines in question play a big role in how the compiler optimizes that segment.

Using someone else's example on this question:

typedef struct
{
    unsigned int first;
    unsigned int second;
} dataStruct;

dataStruct data;

int main()
{
    dataStruct *pData = &data;

    data.first = 9;
    pData->second = 10;

    return(0);
}

With gcc (not that great a compiler) you get:

mov r2, #10
mov r1, #9
stmia   r3, {r1, r2}

So both lines of C code are joined into one store, the problem here is the example used as a test. Two separate functions would have been a little better but it needs a lot more code around it and the pointer needs to point at some other memory so the optimizer doesnt realize it is a static global address, to test this you need to pass the address in so the compiler (well gcc) cannot figure out that it is a static address.

Or with no optimizations, same code, same compiler, no difference between pointer and direct.

mov r3, #9
str r3, [r2, #0]

mov r3, #10
str r3, [r2, #4]

This is what you would expect to see depending on the compiler and processor, there may be no difference. For this processor even if the test code hid the static address for the pointer from the function it would still boil down to two instructions. If the value being stored in the structure element were already loaded in a register then it would be one instruction either way, pointer or direct.

So the answer to your question is not absolute...it depends. disassemble and test.



回答6:

I suppose that, if this makes a difference at all, that would be architecture-dependent.



回答7:

In general, accessing the struct directly would be quicker, as it won't require an extra pointer dereference. The pointer dereference means that it has to take the pointer (the thing in the variable), load whatever it points to, then operate on it.



回答8:

In C, there should be no difference, or a insignificant performance hit.

C students are taught:

pMyGlobal->MaxSpeed == (*pMyGlobal).MaxSpeed

You should be able to compare the disassembly of them both to convince yourself that they are essentially the same, even if you aren't an Assembly-code programmer.

If you are looking for a performance optimization, I would look elsewhere. You won't be able to save enough CPU cycles with this kind of micro-optimization.

For stylistic reasons, I prefer the Structure-Dot notation, especially when dealing with singleton-globals. I find it much cleaner to read.



回答9:

Direct member access is faster (for pointers you'd get one pointer dereference operation more, typically). Although I'm having a hard time imagining it in a situation where it'd be a problem, performance or otherwise.