C++'s Strict Aliasing Rule - Is the 'char&

2020-02-25 07:47发布

问题:

Just a couple weeks ago, I learned that the C++ Standard had a strict aliasing rule. Basically, I had asked a question about shifting bits -- rather than shifting each byte one at a time, to maximize performance I wanted to load my processor's native register's with (32 or 64 bits, respectively) and perform the shift of 4/8 bytes all in a single instruction.

This is the code I wanted to avoid:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

for (int i = 0; i < 3; ++i)
{
  buffer[i] <<= 4; 
  buffer[i] |= (buffer[i + 1] >> 4);
}
buffer[3] <<= 4;

And instead, I wanted to use something like:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };
unsigned int *p = (unsigned int*)buffer; // unsigned int is 32 bit on my platform
*p <<= 4;

Someone called out in a comment that my proposed solution violated the C++ Aliasing rules (because p was of type int* and buffer was of type char* and I was dereferencing p to perform the shift. (Please ignore possible issues of alignment and byte order -- I handle those outside of this snippet) I was quite surprised to learn about he Strict Aliasing rule since I regularly operate on data from buffers, casting it from one type to another and have never had any issue. Further investigation revealed that the compiler I use (MSVC) doesn't enforce strict aliasing rules and since I only develop on gcc/g++ in my spare time as a hobby, I likely just hadn't encountered the issue yet.

So then I asked a question about Strict Aliasing Rules and C++'s Placement new operator:

IsoCpp.org offers a FAQ regarding placement new and they provide the following code example:

#include <new>        // Must #include this to use "placement new"
#include "Fred.h"     // Declaration of class Fred
void someCode()
{
  char memory[sizeof(Fred)];     // Line #1
  void* place = memory;          // Line #2
  Fred* f = new(place) Fred();   // Line #3 (see "DANGER" below)
  // The pointers f and place will be equal
  // ...
}

The example is simple enough, but I'm asking myself, "What if someone calls a method on f -- e.g. f->talk()? At that point we would be dereferencing f, which points to the same memory location as memory (of type char*. I've read numerous places that there is an exemption for variables of type char* to alias any type, but I was under the impression that it wasn't a "two-way street" -- meaning, char* can alias (read/write) any type T, but type T can only be used to alias a char* if T itself is of char*. As I'm typing this, that doesn't make any sense to me and so I'm leaning towards the belief that the claim that my initial (bit shifting example) violated the strict aliasing rule is false.

Can someone please explain what is correct? I've been going nuts with trying to understand what is legal and what is not (despite having read numerous websites and SO posts on the topic)

Thank you

回答1:

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.



回答2:

As a matter of fact, explanation of standard rule regarding pointer type punning through strict aliasing is not neccessarily correct or easy to understand. Standard doesn't mention 'strict aliasing', and I find original standard wording easier to understand and reason about.

In essence, it says that you can only access an object thorugh a pointer to the related type which is suited to access this object (such as the same type or related class type) or through a pointer to char*.

As you see, the question of 'two-way street' is not even applicable.