What is the recommended way to align memory in C++

2019-01-10 02:25发布

问题:

I am working on a single producer single consumer ring buffer implementation.I have two requirements:

1) Align a single heap allocated instance of a ring buffer to a cache line.

2) Align a field within a ring buffer to a cache line (to prevent false sharing).

My class looks something like:

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

Let me first tackle point 1 i.e. aligning a single heap allocated instance of the class. There are a few ways:

1) Use the c++ 11 alignas(..) specifier:

template<typename T, uint64_t num_events>
class alignas(CACHE_LINE_SIZE) RingBuffer {
public:
  ....

private:
  // All the private fields.

};

2) Use posix_memalign(..) + placement new(..) without altering the class definition. This suffers from not being platform independent:

 void* buffer;
 if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
   perror("posix_memalign did not work!");
   abort();
 }
 // Use placement new on a cache aligned buffer.
 auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();

3) Use the GCC/Clang extension __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {
public:
  ....

private:
  // All the private fields.

} __attribute__ ((aligned(CACHE_LINE_SIZE)));

4) I tried to use the C++ 11 standardized aligned_alloc(..) function instead of posix_memalign(..) but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h

Are all of these guaranteed to do the same thing? My goal is cache-line alignment so any method that has some limits on alignment (say double word) will not do. Platform independence which would point to using the standardized alignas(..) is a secondary goal.

I am not clear on whether alignas(..) and __attribute__((aligned(#))) have some limit which could be below the cache line on the machine. I can't reproduce this any more but while printing addresses I think I did not always get 64 byte aligned addresses with alignas(..). On the contrary posix_memalign(..) seemed to always work. Again I cannot reproduce this any more so maybe I was making a mistake.

The second aim is to align a field within a class/struct to a cache line. I am doing this to prevent false sharing. I have tried the following ways:

1) Use the C++ 11 alignas(..) specifier:

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
};

2) Use the GCC/Clang extension __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
};

Both these methods seem to align consumer_sequence to an address 64 bytes after the beginning of the object so whether consumer_sequence is cache aligned depends on whether the object itself is cache aligned. Here my question is - are there any better ways to do the same?

EDIT: The reason aligned_alloc did not work on my machine was that I was on eglibc 2.15 (Ubuntu 12.04). It worked on a later version of eglibc.

From the man page: The function aligned_alloc() was added to glibc in version 2.16.

This makes it pretty useless for me since I cannot require such a recent version of eglibc/glibc.

回答1:

Unfortunately the best I have found is allocating extra space and then using the "aligned" part. So the RingBuffer new can request an extra 64 bytes and then return the first 64 byte aligned part of that. It wastes space but will give the alignment you need. You will likely need to set the memory before what is returned to the actual alloc address to unallocate it.

[Memory returned][ptr to start of memory][aligned memory][extra memory]

(assuming no inheritence from RingBuffer) something like:

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

For the second requirement of having a data member of RingBuffer also 64 byte aligned, for that if you know that the start of this is aligned, you can pad to force the alignment for data members.



回答2:

The answer to your problem is std::aligned_storage. It can be used top level and for individual members of a class.



回答3:

After some more research my thoughts are:

1) Like @TemplateRex pointed out there does not seem to be a standard way to align to more than 16 bytes. So even if we use the standardized alignas(..)there is no guarantee unless the alignment boundary is less than or equal to 16 bytes. I'll have to verify that it works as expected on a target platform.

2) __attribute ((aligned(#))) or alignas(..) cannot be used to align a heap allocated object as I suspected i.e. new() doesn't do anything with these annotations. They seem to work for static objects or stack allocations with the caveats from (1).

Either posix_memalign(..) (non standard) or aligned_alloc(..) (standardized but couldn't get it to work on GCC 4.8.1) + placement new(..) seems to be the solution. My solution for when I need platform independent code is compiler specific macros :)

3) Alignment for struct/class fields seems to work with both __attribute ((aligned(#))) and alignas() as noted in the answer. Again I think the caveats from (1) about guarantees on alignment stand.

So my current solution is to use posix_memalign(..) + placement new(..) for aligning a heap allocated instance of my class since my target platform right now is Linux only. I am also using alignas(..) for aligning fields since it's standardized and at least works on Clang and GCC. I'll be happy to change it if a better answer comes along.



回答4:

I don't know if it is the best way to align memory allocated with a new operator, but it is certainly very simple !

This is the way it is done in thread sanitizer pass in GCC 6.1.0

#define ALIGNED(x) __attribute__((aligned(x)))

static char myarray[sizeof(myClass)] ALIGNED(64) ;
var = new(myarray) myClass;

Well, in sanitizer_common/sanitizer_internal_defs.h, it is also written

// Please only use the ALIGNED macro before the type.
// Using ALIGNED after the variable declaration is not portable!        

So I do not know why the ALIGNED here is used after the variable declaration. But it is an other story.