Struct Reordering by compiler [duplicate]

2020-08-09 04:46发布

Suppose I have a struct like this:

struct MyStruct
{
  uint8_t var0;
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the uint32_t variable.

In actuality (after aligning the structure so that it can actually use the uint32_t variable) it might look something like this:

struct MyStruct
{
  uint8_t var0;
  uint8_t unused[3];  //3 bytes of wasted space
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

A more efficient struct would be:

struct MyStruct
{
  uint8_t var0;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
  uint32_t var1;
};

Now, the question is:

Why is the compiler forbidden (by the standard) from reordering the struct?

I don't see any way you could shoot your self in the foot if the struct was reordered.

8条回答
聊天终结者
2楼-- · 2020-08-09 05:22

Imagine this struct layout is actually a memory sequence received 'over the wire', say an Ethernet packet. if the compiler re-aligned things to be more efficient, then you would have to do loads of work pulling out bytes in the required order, rather than just using a struct which has all the correct bytes in the correct order and place.

查看更多
Explosion°爆炸
3楼-- · 2020-08-09 05:27

You also quote C++, so I'll give you a practical reasons why that can't happen.

Given there's no difference between class and struct, consider:

class MyClass
{
    string s;
    anotherObject b;

    MyClass() : s{"hello"}, b{s} 
    {}

};

Now C++ requires non-static data members to be initialized in the order they were declared:

— Then, non-static data members are initialized in the order they were declared in the class definition

as per [base.class.init/13]. So the compiler cannot reorder fields within the class definition, because otherwise (as an example) members depending on the initialization of others couldn't work.

The compiler isn't strictly required not reorder them in memory (for what I can say) — but, especially considering the example above, it would be terribly painful to keep track of that. And I doubt of any performance improvements, unlike padding.

查看更多
ゆ 、 Hurt°
4楼-- · 2020-08-09 05:31

Remember that not only automatic re-ordering of the elements to improve packing can work in detriment of specific memory layouts or binary serialization, but the order of the properties may have been carefully chosen by the programmer to benefit cache-locality of frequently used members against the more rarely accessed.

查看更多
做自己的国王
5楼-- · 2020-08-09 05:33

The language designed by Dennis Ritchie defined the semantics of structures not in terms of behavior, but in terms of memory layout. If a structure S had a member M of type T at offset X, then the behavior of M.S was defined as taking the address of S, adding X bytes to it, interpreting it as a pointer to T, and interpreting the storage identified thereby as an lvalue. Writing a structure member would change the contents of its associated storage, and changing the contents of a member's storage would change the value of a member. Code was free to use a wide variety of ways of manipulating the storage associated with structure members, and the semantics would be defined in terms of operations on that storage.

Among the useful ways that code could manipulate the storage associated with a structure was the use of memcpy() to copy an arbitrary portion of one structure to a corresponding portion of another, or memset() to clear an arbitrary portion of a structure. Since structure members were laid out sequentially, a range of members could be copied or cleared using a single memcpy() or memset() call.

The language defined by the Standard Committee eliminates in many cases the requirement that changes to structure members must affect the underlying storage, or that changes to the storage affect the member values, making guarantees about structure layout less useful than they had been in Ritchie's language. Nonetheless, the ability to use memcpy() and memset() was retained, and retaining that ability required keeping structure elements sequential.

查看更多
相关推荐>>
6楼-- · 2020-08-09 05:41

The compiler should keep the order of its members in the case the structures are read by any other low-level code produced by another compiler or another language. Say you were creating an operating system, and you decide to write part of it in C, and part of it in assembly. You could define the following structure:

struct keyboard_input
{
    uint8_t modifiers;
    uint32_t scancode;
}

You pass this to an assembly routine, where you need to manually specify the memory layout of the structure. You would expect to be able to write the following code on a system with 4-byte alignment.

; The memory location of the structure is located in ebx in this example
mov al, [ebx]
mov edx, [ebx+4]

Now say the compiler would change the order of the members in the structure in an implementation defined way, this would mean that depending on the compiler you use and the flags you pass to it, you could either end up with the first byte of the scancode member in al, or with the modifiers member.

Of course the problem is not just reduced to low-level interfaces with assembly routines, but would also appear if libraries built with different compilers would call each other (e.g. building a program with mingw using the windows API).

Because of this, the language just forces you to think about the structure layout.

查看更多
一纸荒年 Trace。
7楼-- · 2020-08-09 05:43

Why is the compiler forbidden (by the standard) from reordering the struct?

The basic reason is: for compatibility with C.

Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific struct.

This has led to multiple features relying on this property:

  • C guaranteed that the address of a struct and the address of its first data member are one and the same, so C++ does too (in the absence of virtual inheritance/methods).

  • C guaranteed that if you have two struct A and B and both start with a data member char followed by a data member int (and whatever after), then when you put them in a union you can write the B member and read the char and int through its A member, so C++ does too: Standard Layout.

The latter is extremely broad, and completely prevents any re-ordering of data members for most struct (or class).


Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified.

As far as I know, no compiler attempts to take advantage of it; but they could in theory.

Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.

查看更多
登录 后发表回答