I have a "column" container type:
struct MyColumnType {
// Data: Each row represents a member of an object.
vector<double> a; // All vectors are guaranteed to have always
vector<string> b; // the same length.
vector<int> c;
void copy(int from_pos, int to_pos); // The column type provides an interface
void swap(int pos_a, int pos_b); // for copying, swapping, ...
void push_back(); // And for resizing the container.
void pop_back();
void insert(int pos);
void remove(int pos);
// The interface can be extended/modified if required
};
Usage:
// If table is a constructed container with elements stored
// To acces the members of the object stored at the 4th position:
table.a[4] = 4.0;
table.b[4] = "4th";
table.c[4] = 4;
Question: How can I create a standard-compliant random access iterator (and probably a required proxy reference type) for this kind of container?
I want to be able to use std::algorithms
for random access iterators with my type, e.g. sort
(note: for sorting the comparison would be provided by an user-defined functor, e.g. a lambda).
In particular the iterator should provide an interface similar to
struct {
double& a;
string& b;
int& c;
};
Note 0: C++11/C++14 is allowed.
Note 1: There is an old paper http://hci.iwr.uni-heidelberg.de/vigra/documents/DataAccessors.ps where a similar attempt is undertaken. However, I haven't been able to get their approach working with sort. Requirements like defaultConstructible are hard to satisfy using a proxy type approach (why does std::sort
require types to be default constructible instead of swappable is beyond my understanding).
Note 2: I cannot do the following:
struct MyType {
double a;
string b;
int c;
};
std::vector<MyType> v;
and then use std::algorithm
.
Motivation: Performance. A cache-line is usually 64bytes, i.e. 8 doubles. In this simple struct if you iterate over the doubles, you are polluting a cache-line with a string an an int. In other cases, you might get only 1 double transfered per cache-line. That is, you end up using 1/8-th of the memory bandwith available. If you need to iterate over a couple of Gb of doubles, this simple decision improves your application performance by a factor of 6-7x. And no, I cannot give that up.
Bonus: the answer should be as generic as possible. Think about adding/removing fields to the container type as adding/removing members to a struct. You don't want to change a lot of code every time you add a new member.
I think something like this could be Standard-compliant. It uses some C++11 features to simplify the syntax, but could as well be changed to comply C++03 AFAIK.
Tested and works with clang++3.2
Prelude:
The iterator classes: a
value_type
(all_copy
), areference
type (all_reference
) and the iterator type (all_iterator
). Iterating is done by keeping and updating three iterators (one to eachvector
). I don't know if that's the most performant option, though.How it works:
std::iterator_traits
defines several associated types for an iterator: [iterator.traits]/1Therefore, you can introduce a struct (
all_reference
) keeping three references asreference
type. This type is the return value of*a
, wherea
is of the iterator type (possiblyconst
-qualified). There needs to be a differentvalue_type
because some Standard Library algorithms such assort
might want to create a local variable temporarily storing the value of*a
(by copy or move into the local variable). In this case,all_copy
provides this functionality.You're not required to use it (
all_copy
) in you own loops, where it could affect performance.There needs to be a comparison function for
std::sort
. For some reason we have to provide all three.Now, the iterator class:
Usage example:
Output:
If you're really concerned about performance and you want to sort your container with
std::sort
, use the overload that allows you to provide a custom comparison object:.. and sort an array of indices into the container. Here's how:
You'll need the following members in your container:
Then define the following comparison object:
And use it to sort an array of indices:
The 'less' member of the container controls which order to sort in:
The sorted array of indices can be used in further algorithms - you can avoid copying the actual data around until you need to.
All
std
algorithms that work with RandomAccessIterators have overloads that allow you to specify custom comparison objects, so they can also be used with this technique.