mmap and C++ strict aliasing rules

2020-03-30 02:14发布

问题:

Consider a POSIX.1-2008 compliant operating system, and let fd be a valid file descriptor (to an open file, read mode, enough data...). The following code adheres to the C++11 standard* (ignore error checking):

void* map = mmap(NULL, sizeof(int)*10, PROT_READ, MAP_PRIVATE, fd, 0);
int* foo = static_cast<int*>(map);

Now, does the following instruction break strict aliasing rules?

int bar = *foo;

According to the standard:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

What's the dynamic type of the object pointed by map / foo ? Is that even an object? The standard says:

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete.

Does this mean that the mapped memory contains 10 int objects (suppose that the initial address is aligned)? But if it is true, wouldn't this apply also to this code (which clearly breaks strict aliasing)?

char baz[sizeof(int)];
int* p=reinterpret_cast<int*>(&baz);
*p=5;

Even oddly, does that mean that declaring baz starts the lifetime of any (properly aligned) object of size 4?


Some context: I am mmap-ing a file which contains a chunk of data which I wish to directly access. Since this chunk is large I'd like to avoid memcpy-ing to a temporary object.


*can nullptr be instead of NULL here, is it implicitly casted to NULL? Any reference from the standard?

回答1:

I believe simply casting does violate strict aliasing. Arguing that convincingly is above my paygrade, so here is an attempt at a workaround:

template<class T>
T* launder_raw_pod_at( void* ptr ) {
  static_assert( std::is_pod<T>::value, "this only works with plain old data" );
  char buff[sizeof(T)];
  std::memcpy( buff, ptr, sizeof(T) );
  T* r = ::new(ptr) T;
  std::memcpy( ptr, buff, sizeof(T) );
  return r;
}

I believe the above code has zero observable side effects on memory and returns a pointer to a legal T* at location ptr.

Check if your compiler optimizes the above code to a noop. To do so, it has to understand memcpy at a really fundamental level, and constructing a T has to do nothing to the memory there.

At least clang 4.0.0 can optimize this operation away.

What we do is we first copy the bytes away. Then we use placement new to create a T there. Finally, we copy the bytes back.

We have a legally created T with exactly the bytes we want in it.

But the copy away and back are to a local buffer, so it has no observable effect.

The construction of the object, if a pod, doesn't have to touch bytes either; technically the bytes are undefined. But compilers who are smart say "do nothing".

So the compiler can work out that all this manipulation can be skipped at runtime. At the same time, we have in the abstract machine properly created an object with the proper bytes at that location. (assuming it has valid alignment! But that isn't this code's problem.)



标签: c++ c++11 posix