Is the member field order of a class “stable”?

2019-07-24 19:21发布

问题:

Considering c++ (or c++11), where I have some array of data with 2*N integers which represent N pairs. For each even i=0,2,4,6,...,2*N it holds that (data[i],data[i+1]) forms such a pair. Now I want to have a simple way to access these pairs without the need to write loops like:

for(int i=0; i<2*N; i+=2) { ... data[i] ... data[i+1] ... }

So I wrote this:

#include <iostream>

struct Pair {
    int first; int second;
};

int main() {
    int N=5;
    int data[10]= {1,2,4,5,7,8,10,11,13,14};
    Pair *pairs = (Pair *)data;

    for(int i=0; i<N; ++i)
        std::cout << i << ": (" << pairs[i].first << ", " << pairs[i].second << ")" << std::endl;

    return 0;
}

Output:

0: (1, 2)
1: (4, 5)
2: (7, 8)
3: (10, 11)
4: (13, 14)

ideaone: http://ideone.com/DyWUA8

As you can see, I cast the int pointer to a Pair pointer, such that c++ simply handles that my data is twice the size of an int. And I know, because that is how arrays work, that the data array is aligned in pairs of two sizeof(int)'s. However, I am not sure whether I can assume that a Pair is exactly two sizeof(int)'s and whether the member fields first and second are stored in that order (or alignment). Technically, in a worst case setting I can imagine that compiler stores 2 bytes for first, then 4 of second and then 2 of first (given that int's are 4 bytes), and somehow manages this. Of course, that is probably ludicrous, but is it allowed in c++?

Please note, I don't want to copy all the data to a new array and manually convert it to Pairs. Imho, that's an expensive operation for just syntax sugar.

May I assume the alignment of the Pair class? Does the same hold for structs? Are there other ways?

From what I read here ( How is the size of a C++ class determined? ) it is up to the compiler, and not in the language, of c++ how classes are aligned in memory. Does this mean that I am doomed to copy my data or use nasty syntax? Can I somehow force minimal alignment in the c++ language, or would I need compiler switches?

回答1:

Does this mean that I am doomed to copy my data or use nasty syntax?

No

Are there other ways?

Yes, use a wrapper class that provides the syntax you like. Here's one way

http://ideone.com/nitI0B

#include <iostream>

struct Pairs {
    int* _data;
    Pairs( int data[] ) : _data(data) {}
    int & first( size_t x ) const { return _data[x*2]; }
    int & second( size_t x ) const { return _data[x*2+1]; }
};

int main() {
    int N=5;
    int data[10]= {1,2,4,5,7,8,10,11,13,14};
    Pairs pairs( data );

    for(int i=0; i<N; ++i)
        std::cout << i << ": (" << pairs.first(i) << ", " << pairs.second(i) << ")" << std::endl;

    return 0;
}

Update

Here's a solution that wraps an int[2] in a struct (like C++11 std::array) but tolerates (in fact forces) padding by the compiler after the int[2]. The compiler is not likely to add any additional padding, but the standard doesn't preclude it. I've also added a random access iterator to allow passing iterators to std::sort and sort the original data as pairs. I did this for my one education, might be more trouble than it's worth.

See this in ideone

// http://stackoverflow.com/questions/23480041/is-the-member-field-order-of-a-class-stable
#include <iostream>
#include <algorithm>
#include <stddef.h>

struct Pair {
    int _data[2]; // _data[0] and _data[1] are consecutive,
                  // and _data[0] is at offset 0 (&Pair == &_data[0])
    int _unused[6]; // simulate the compiler inserted some padding here
    int first() { return _data[0]; }
    int second() { return _data[1]; }
    int & operator[] ( size_t x ) { return _data[x]; }
    friend inline bool operator< ( const Pair & lhs, const Pair & rhs ) {
        return lhs._data[0] < rhs._data[0];
    }
    // it is unlikely that the compiler will add any padding to a struct
    // Pair, so sizeof(Pair) == sizeof(_data)
    // however, the standard doesn't preclude it, so we define our own
    // copy constructor and assignment operator to ensure that nothing
    // extraneous is stored
    Pair( const Pair& other ) {
        _data[0] = other._data[0];
        _data[1] = other._data[1];
    }
    const Pair& operator=( const Pair& other ) {
        _data[0] = other._data[0];
        _data[1] = other._data[1];
        return *this;
    }
};

struct Pairs {
    int* _data;
    size_t _size;
    Pairs( int data[], size_t size ) : _data(data), _size(size) {}
    Pair & operator[] ( size_t x ) const {
        return *reinterpret_cast< Pair * >( _data + 2 * x );
    }
    class rai
        : public std::iterator<std::random_access_iterator_tag, Pair>
    {
        int * _p;
        size_t _size;
        size_t _x;
    public:
        rai() : _p(NULL), _size(0), _x(0) {}
        rai( int* data, size_t size )
            : _p(data), _size(size), _x(0) {}
        friend inline bool operator== (const rai& lhs, const rai& rhs) {
            return lhs._p == rhs._p && lhs._x == rhs._x;
        }
        friend inline bool operator!= (const rai& lhs, const rai& rhs) {
            return lhs._p != rhs._p || lhs._x != rhs._x;
        }
        Pair& operator* () const {
            return *reinterpret_cast< Pair * >( _p + 2 * _x );
        }
        rai& operator+=( ptrdiff_t n ) {
            _x += n;
            if (_x >= _size) { _x = _size = 0; _p = NULL; }
            return *this;
        }
        rai& operator-=( ptrdiff_t n ) {
            if (n > _x) _x = 0;
            else _x -= n;
            return *this;
        }
        friend inline rai operator+ ( rai lhs, const ptrdiff_t n ) {
            return lhs += n;
        }
        friend inline rai operator- ( rai lhs, const ptrdiff_t n ) {
            return lhs -= n;
        }
        friend inline bool operator< ( const rai & lhs, const rai & rhs ) {
            return *lhs < *rhs;
        }
        rai& operator++ () { return *this += 1; }
        rai& operator-- () { return *this -= 1; }
        friend inline ptrdiff_t operator-(const rai& lhs, const rai& rhs) {
            return lhs._p == NULL
                ? (rhs._p == NULL ? 0 : rhs._size - rhs._x)
                : lhs._x - rhs._x;
        }
    };
    inline rai begin() { return rai(_data,_size); }
    static inline const rai end() { return rai(); }
};

int main() {
    int N=5;
    int data[10]= {1,2,7,8,13,14,10,11,4,5};
    Pairs pairs( data, N );

    std::cout << "unsorted" << std::endl;
    for(int i=0; i<N; ++i)
       std::cout << i << ": (" << pairs[i].first() << ", "
                 << pairs[i].second() << ")" << std::endl;

    std::sort(pairs.begin(), pairs.end());

    std::cout << "sorted" << std::endl;
    for(int i=0; i<N; ++i)
        std::cout << i
            << ": (" << pairs[i][0]  // same as pairs[i].first()
            << ", "  << pairs[i][1]  // same as pairs[i].second()
            << ")" << std::endl;

    return 0;
}


回答2:

What you have done violates the strict aliasing rules and as such causes undefined behavior, in addition to any possible size and alignment issues.

The cleanest solution would be to store the data in logical pairs rather than as flat data, by doing a one-time conversion if needed. Don't worry about the performance of doing the data conversion unless profiling shows you that that is where your execution time is being spent. The clarity of grouping your data into pairs will almost certainly pay off in correctness in the long-term.

Alternately you could create inline functions that abstract away accessing notional "first" and "second" attributes of the flat array data.



回答3:

The alignment of a struct is at least the biggest alignment of its members, but it can be bigger. Also the compiler can add padding between your members, so your code is not safe.

Basiscally the only guarantees about struct layouts are:

  1. The first member is at offset 0.
  2. The members order in memory is the same as in the code.

You can use the first guarantee with this definition:

struct Pair {
    int p[2];
};

Now, sizeof(Pair) could be bigger than 2*sizeof(int) but that shouldn't matter too much.

Alternatively, if you want extra fun:

typedef int Pair[2];

Pointers to arrays are fun!

Anyway my advice is to do this:

int data[10]= {1,2,4,5,7,8,10,11,13,14};

for(int i=0; i<N; ++i)
{
    int *pair = data + 2*i;
    std::cout << i << ": (" << pairs[0] << ", " << pairs[1] << ")" << std::endl;
}

Or if you prefer the extra fun:

typedef int Pair[2];
int data[10]= {1,2,4,5,7,8,10,11,13,14};
Pair *pairs = (Pair*)data;

for(int i=0; i<N; ++i)
{
    std::cout << i << ": (" << pairs[i][0] << ", " << pairs[i][1] << ")" << std::endl;
}