How is a vector's data aligned?

2019-01-07 07:36发布

问题:

If I want to process data in a std::vector with SSE, I need 16 byte alignment. How can I achieve that? Do I need to write my own allocator? Or does the default allocator already align to 16 byte boundaries?

回答1:

C++ standard requires allocation functions (malloc() and operator new()) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, on practice it means that the alignment for all allocations is the same and is the alignment of a standard type with the largest alignment requirement, which often is long double and/or long long (see boost max_align union).

Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign() or memalign() can be used to satisfy such allocations with stronger alignment requirements.



回答2:

You should use a custom allocator with std:: containers, such as vector. Can't remember who wrote the following one, but I used it for some time and it seems to work (you might have to change _aligned_malloc to _mm_malloc, depending on compiler/platform):

#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H

#include <stdlib.h>
#include <malloc.h>

template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
  typedef T value_type;
  typedef std::size_t size_type;
  typedef std::ptrdiff_t difference_type;

  typedef T * pointer;
  typedef const T * const_pointer;

  typedef T & reference;
  typedef const T & const_reference;

  public:
  inline AlignmentAllocator () throw () { }

  template <typename T2>
  inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }

  inline ~AlignmentAllocator () throw () { }

  inline pointer adress (reference r) {
    return &r;
  }

  inline const_pointer adress (const_reference r) const {
    return &r;
  }

  inline pointer allocate (size_type n) {
     return (pointer)_aligned_malloc(n*sizeof(value_type), N);
  }

  inline void deallocate (pointer p, size_type) {
    _aligned_free (p);
  }

  inline void construct (pointer p, const value_type & wert) {
     new (p) value_type (wert);
  }

  inline void destroy (pointer p) {
    p->~value_type ();
  }

  inline size_type max_size () const throw () {
    return size_type (-1) / sizeof (value_type);
  }

  template <typename T2>
  struct rebind {
    typedef AlignmentAllocator<T2, N> other;
  };

  bool operator!=(const AlignmentAllocator<T,N>& other) const  {
    return !(*this == other);
  }

  // Returns true if and only if storage allocated from *this
  // can be deallocated from other, and vice versa.
  // Always returns true for stateless allocators.
  bool operator==(const AlignmentAllocator<T,N>& other) const {
    return true;
  }
};

#endif

Use it like this (change the 16 to another alignment, if needed):

std::vector<T, AlignmentAllocator<T, 16> > bla;

This, however, only makes sure the memory block std::vector uses is 16-bytes aligned. If sizeof(T) is not a multiple of 16, some of your elements will not be aligned. Depending on your data-type, this might be a non-issue. If T is int (4 bytes), only load elements whose index is a multiple of 4. If it's double (8 bytes), only multiples of 2, etc.

The real issue is if you use classes as T, in which case you will have to specify your alignment requirements in the class itself (again, depending on compiler, this might be different; the example is for GCC):

class __attribute__ ((aligned (16))) Foo {
    __attribute__ ((aligned (16))) double u[2];
};

We're almost done! If you use Visual C++ (at least, version 2010), you won't be able to use an std::vector with classes whose alignment you specified, because of std::vector::resize.

When compiling, if you get the following error:

C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

You will have to hack your stl::vector header file:

  1. Locate the vector header file [C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector]
  2. Locate the void resize( _Ty _Val ) method [line 870 on VC2010]
  3. Change it to void resize( const _Ty& _Val ).


回答3:

Instead of writing your own allocator, as suggested before, you can use boost::alignment::aligned_allocator for std::vector like this:

#include <vector>
#include <boost/align/aligned_allocator.hpp>

template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;


回答4:

Short Answer:

If sizeof(T)*vector.size() > 16 then Yes.
Assuming you vector uses normal allocators

Caveat: As long as alignof(std::max_align_t) >= 16 as this is the max alignment.

Long Answer:

Updated 25/Aug/2017 new standard n4659

If it is aligned for anything that is greater than 16 it is also aligned correctly for 16.

6.11 Alignment (Paragraph 4/5)

Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.

Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.

new and new[] return values that are aligned so that objects are correctly aligned for their size:

8.3.4 New (paragraph 17)

[ Note: when the allocation function returns a value other than null, it must be a pointer to a block of storage in which space for the object has been reserved. The block of storage is assumed to be appropriately aligned and of the requested size. The address of the created object will not necessarily be the same as that of the block if the object is an array. — end note ]

Note most systems have a maximum alignment. Dynamically allocated memory does not need to be aligned to a value greater than this.

6.11 Alignment (paragraph 2)

A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (21.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject.

Thus as long as your vector memory allocated is greater than 16 bytes it will be correctly aligned on 16 byte boundaries.



回答5:

Write your own allocator. allocate and deallocate are the important ones. Here is one example:

pointer allocate( size_type size, const void * pBuff = 0 )
{
    char * p;

    int difference;

    if( size > ( INT_MAX - 16 ) )
        return NULL;

    p = (char*)malloc( size + 16 );

    if( !p )
        return NULL;

    difference = ( (-(int)p - 1 ) & 15 ) + 1;

    p += difference;
    p[ -1 ] = (char)difference;

    return (T*)p;
}

void deallocate( pointer p, size_type num )
{
    char * pBuffer = (char*)p;

    free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}


回答6:

Don't assume anything about STL containers. Their interface/behaviour is defined, but not what's behind them. If you need raw access, you'll have to write your own implementation that follows the rules you'd like to have.



回答7:

Use declspec(align(x,y)) as explained in vectorization tutorial for Intel, http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf



回答8:

The Standard mandates that new and new[] return data aligned for any data type, which should include SSE. Whether or not MSVC actually follows that rule is another question.