Say I have a large code using Array of Structures (AoS) memory layout. I would like to build a zero-cost abstraction in C++ which allows me to switch between AoS and SoA with as little refactoring effort as possible. For instance take a class with access member functions
struct Item{
auto& myDouble(){ return mDouble; }
auto& myChar(){ return mChar; }
auto& myString(){ return mString; }
private:
double mDouble;
char mChar;
std::string mString;
};
which is used inside a container in a loop
std::vector<Item> vec_(1000);
for (auto& i : vec_)
i.myDouble()=5.;
I would like to change the first snippet while the second one remains similar.. e.g. having something like
MyContainer<Item, SoA> vec_(1000)
for (auto& i : vec_)
i.myDouble()=5.;
in which I can select the memory layout with a "SoA" or "AoS" template parameters. My questions are: does such thing exist somewhere? And if it doesn't, how would it be implemented at best?
To achieve what you want, you simply have to make your new structure, iterable. Forgive my Java lingo, what I mean by iterable in C++, is simply that you should create functions within your class called
begin
andend
. These should return an iterator object which has the(pre)++
or++(post)
overloaded, as well as the*(pointer)
operator.Another way is this: Why use non-member begin and end functions in C++11?
This will now allow you simply swap the container type and have the for-range loop still work the way it should.
I implemented a generic solution, I'll explain it here below (it will be a long post). This is not the only possible answer of course, and it'd be great to gather feedback. I placed the full code of this solution here https://github.com/crosetto/SoAvsAoS
We create two helper classes which given an item generate the container type as a vector of tuples or a tuple of vectors, depending on a tag template argument. We call this class a DataLayoutPolicy and we are going to use it e.g. in this way:
to generate a tuple of vectors of char, int and double.
This class will only contain static member functions to interact with the container (e.g. extract an element, insert, resize, etc...). We write two template specializations. The first one (trivial) for the array of structures situation:
... just forwarding. We do the same for the structure of arrays case.
Note: there are a few things to be explained about the code below.
It wraps all the types in a ref_wrap type, which is a "decorated" std::reference_wrapper. This is because we want to access the elements as lvalue references, to be able to change their values. using a regular reference we would be in trouble if e.g. Types contains any reference. One thing worth noticing is that in the AoS case the DataLayoutPolicy::value_type is a reference, while in the SoA case is the value of a ref_wrap type.
we return by value a newly created tuple of ref_wrap of the values. This is surpisingly OK, because the compiler is optimizing all copies away, and it is even more OK in C++17 (the returned tuple is a 'prvalue'), because of the guaranteed copy elision added to the standard: the tuple is not copied, this code would work even if std::tuple and std::reference_wrapper didn't have a copy/move constructor.
we use a std::integer sequence to statically unroll a parameter pack: this is ugly but it is "the way" to do it since C++14 (and in C++11 one had to use template recursion to achieve the same). There is not yet such thing like a "for_each" for parameter packs.
we use C++17 fold expressions to call a function returning void multiple times. Before C++17 this was achieved concisely with tricky hacks.
So now this code shows pretty clearly how this abstraction can be built. We show below a possible strategy to use it. We define the policy_t type using the DataLayoutPolicy and a generic TItem type
The container class forwards most of the calls to the static functions defined by the policy_t type. It might look like the following
Now this is no standard container, so we have to define an iterator in order to use it whithin STL algorithms. The iterator we build looks like a STL iterator for a container of tuple, except for the fact that it must hold a reference to the container, because when we call the dereference operator we want to call our storage's operator[], which statically dispatches the operation using the container's data layout policy.
Eventually we define our "item" data structure: we make it a decorator of a std::tuple, with some specific member functions (in this case only getters/setters).
When we call Item's member functions we have to rely on compiler optimization in order for our abstraction to be "zero-cost": we don't want to call the Item constructor, because we are creating a temporary tuple just to access one of it's members each time and then we thrash it right away.
so eventually we can write the program:
and we can write generic and efficient code regardless of the memory layout underneath. What's left to do is to check that this is a zero cost abstraction. The easiest way for me to check that is using a debugger: compile the example with debug symbols on,
run it with gdb, set a brakpoint in the for loop, and step through the assembly instructions (the layout split command shows the source code and disassembled instructions at the same time)
The instructions being executed inside the loop are in case of AoS data layout
Notice in particular that in the second line the offset being add to compute the address is 0x160. This changes depending on the size of the data members in the item object. On the other hand for the SoA data structure we have
We see the loop is unrolled and vectorized by Clang (version 6.0.0), and the increment for the address is 0x20, independent of the number of data members present in the item struct.