Reorder vector using a vector of indices

2019-01-06 11:55发布

站内文章 / C++

36 0

三岁会撩人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'd like to reorder the items in a vector, using another vector to specify the order:

char   A[]     = { 'a', 'b', 'c' };
size_t ORDER[] = { 1, 0, 2 };

vector<char>   vA(A, A + sizeof(A) / sizeof(*A));
vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER));

reorder_naive(vA, vOrder);

// A is now { 'b', 'a', 'c' }

The following is an inefficient implementation that requires copying the vector:

void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder)  
{   
    assert(vA.size() == vOrder.size());  
    vector vCopy = vA; // Can we avoid this?  
    for(int i = 0; i < vOrder.size(); ++i)  
        vA[i] = vCopy[ vOrder[i] ];  
}

Is there a more efficient way, for example, that uses swap()?

回答1:

I improved chmike's algorithm. This function agrees with his for all 11! permutations of (0..10) passed as the reordering vector. Also it doesn't modify reordering vector.

template< class T >
void reorder(vector<T> &v, vector<size_t> const &order )  {   
    for ( int s = 1, d; s < order.size(); ++ s ) {
        for ( d = order[s]; d < s; d = order[d] ) ;
        if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );
    }
}

Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over (0..10)!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.

template< typename order_iterator, typename value_iterator >
void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v )  {   
    typedef typename iterator_traits< value_iterator >::value_type value_t;
    typedef typename iterator_traits< order_iterator >::value_type index_t;
    typedef typename iterator_traits< order_iterator >::difference_type diff_t;

    diff_t remaining = order_end - 1 - order_begin;
    for ( index_t s = index_t(), d; remaining > 0; ++ s ) {
        for ( d = order_begin[s]; d > s; d = order_begin[d] ) ;
        if ( d == s ) {
            -- remaining;
            value_t temp = v[s];
            while ( d = order_begin[d], d != s ) {
                swap( temp, v[d] );
                -- remaining;
            }
            v[s] = temp;
        }
    }
}

And finally, just to answer the question once and for all, a variant which does destroy the reorder vector. (It fills it with -1's.) It's about 16% faster than the preceding version. This one uses an ugly typecast, but deal with it. This covers 11! ~= 40 mil permutations of 11 characters in 4.25 seconds, not counting overhead, on my 2.2 GHz laptop.

template< typename order_iterator, typename value_iterator >
void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v )  {
    typedef typename iterator_traits< value_iterator >::value_type value_t;
    typedef typename iterator_traits< order_iterator >::value_type index_t;
    typedef typename iterator_traits< order_iterator >::difference_type diff_t;

    diff_t remaining = order_end - 1 - order_begin;
    for ( index_t s = index_t(); remaining > 0; ++ s ) {
        index_t d = order_begin[s];
        if ( d == (diff_t) -1 ) continue;
        -- remaining;
        value_t temp = v[s];
        for ( index_t d2; d != s; d = d2 ) {
            swap( temp, v[d] );
            swap( order_begin[d], d2 = (diff_t) -1 );
            -- remaining;
        }
        v[s] = temp;
    }
}

回答2:

Here is the correct code

void REORDER(vector<char>& vA, vector<size_t>& vOrder)  
{   
    assert(vA.size() == vOrder.size());

    // for all elements to put in place
    for( int i = 0; i < va.size() - 1; ++i )
    { 
        // while the element i is not yet in place 
        while( i != vOrder[i] )
        {
            // swap it with the element at its final place
            int alt = vOrder[i];
            swap( vA[i], vA[alt] );
            swap( vOrder[i], vOrder[alt] );
        }
    }
}

note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.

On exit vA and vOrder are properly ordered.

This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.

回答3:

If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.

回答4:

It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here follows the "cycles" in vOrder, where following a sub-set (could be all of vOrder) of indexes will cycle through the sub-set, ending back at the first index of the sub-set.

Wiki article on "cycles"

https://en.wikipedia.org/wiki/Cyclic_permutation

In the following example, every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.

template <class T>
void reorder(vector<T>& vA, vector<size_t>& vOrder)  
{   
    assert(vA.size() == vOrder.size());

    // for all elements to put in place
    for( size_t i = 0; i < vA.size(); ++i )
    { 
        // while vOrder[i] is not yet in place 
        // every swap places at least one element in it's proper place
        while(       vOrder[i] !=   vOrder[vOrder[i]] )
        {
            swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] );
            swap(    vOrder[i],     vOrder[vOrder[i]] );
        }
    }
}

This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A[] according to indexes in I[], also sorts I[] :

void reorder(int *A, int *I)
{    
int i, j, k;
int tA;
    /* reorder A according to I */
    /* every move puts an element into place */
    /* time complexity is O(n) */
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++){
        if(i != I[i]){
            tA = A[i];
            j = i;
            while(i != (k = I[j])){
                A[j] = A[k];
                I[j] = j;
                j = k;
            }
            A[j] = tA;
            I[j] = j;
        }
    }
}

回答5:

Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.

With that being said, do not early pessimize. Without changing the code you can remove half of your copies:

    template <typename T>
    void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
    {
       std::vector<T> tmp;         // create an empty vector
       tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
       for ( std::size_t i = 0; i < order.size(); ++i ) {
          tmp.push_back( data[order[i]] );
       }
       data.swap( tmp );          // swap vector contents
    }

This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.

If you want to perform the moves in-place, a simple algorithm could be:

template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
   for ( std::size_t i = 0; i < order.size(); ++i ) {
      std::size_t original = order[i];
      while ( i < original )  {
         original = order[original];
      }
      std::swap( data[i], data[original] );
   }
}

This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...

This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.

回答6:

You could do it recursively, I guess - something like this (unchecked, but it gives the idea):

// Recursive function
template<typename T>
void REORDER(int oldPosition, vector<T>& vA, 
             const vector<int>& vecNewOrder, vector<bool>& vecVisited)
{
    // Keep a record of the value currently in that position,
    // as well as the position we're moving it to.
    // But don't move it yet, or we'll overwrite whatever's at the next
    // position. Instead, we first move what's at the next position.
    // To guard against loops, we look at vecVisited, and set it to true
    // once we've visited a position.
    T oldVal = vA[oldPosition];
    int newPos = vecNewOrder[oldPosition];
    if (vecVisited[oldPosition])
    {
        // We've hit a loop. Set it and return.
        vA[newPosition] = oldVal;
        return;
    }
    // Guard against loops:
    vecVisited[oldPosition] = true;

    // Recursively re-order the next item in the sequence.
    REORDER(newPos, vA, vecNewOrder, vecVisited);

    // And, after we've set this new value, 
    vA[newPosition] = oldVal;
}

// The "main" function
template<typename T>
void REORDER(vector<T>& vA, const vector<int>& newOrder)
{
    // Initialise vecVisited with false values
    vector<bool> vecVisited(vA.size(), false);

    for (int x = 0; x < vA.size(); x++)
    {
        REORDER(x, vA, newOrder, vecVisited);
    }
}

Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?

回答7:

To iterate through the vector is O(n) operation. Its sorta hard to beat that.

回答8:

Your code is broken. You cannot assign to vA and you need to use template parameters.

vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder)  
{   
    assert(vA.size() == vOrder.size());  
    vector<char> vCopy(vA.size()); 
    for(int i = 0; i < vOrder.size(); ++i)  
        vCopy[i] = vA[ vOrder[i] ];  
    return vA;
}

The above is slightly more efficient.

回答9:

It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order. The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter. If the creation and/or copy cost of object T is relevant

template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> & order )
{
 std::size_t i,j,k;
  for(i = 0; i < order.size() - 1; ++i) {
    j = order[i];
    if(j != i) {
      for(k = i + 1; order[k] != i; ++k);
      std::swap(order[i],order[k]);
      std::swap(data[i],data[j]);
    }
  }
}

If the creation cost of your object is small and memory is not a concern (see dribeas):

template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
 std::vector<T> tmp;         // create an empty vector
 tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
 for ( std::size_t i = 0; i < order.size(); ++i ) {
  tmp.push_back( data[order[i]] );
 }
 data.swap( tmp );          // swap vector contents
}

Note that the two pieces of code in dribeas answer do different things.

回答10:

I was trying to use @Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with @Potatoswatter's solution (new_inds below), you can do the following:

vector<int> new_inds(arma_inds.size());
for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;

回答11:

It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:

void permute(vector<T>& values, const vector<size_t>& indices)  
{   
    vector<T> out;
    out.reserve(indices.size());
    for(size_t index: indices)
    {
        assert(0 <= index && index < values.size());
        out.push_back(values[index]);
    }
    values = std::move(out);
}

Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.

回答12:

I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.

std::vector<int32_t> dense_vec = {1, 2, 3};
std::vector<int32_t> order = {1, 0, 2};

int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end());
std::vector<int32_t> sparse_vec(max_val + 1);

int32_t i = 0;
for(int32_t j: dense_vec)
{
    sparse_vec[j] = order[i];
    i++;
}

std::sort(dense_vec.begin(), dense_vec.end(),
    [&sparse_vec](int32_t i1, int32_t i2) {return sparse_vec[i1] < sparse_vec[i2];});

The following assumptions made while writing this code:

Vector values start from zero.
Vector does not contain repeated values.
We have enough memory to sacrifice in order to use std::sort

回答13:

This should avoid copying the vector:

void REORDER(vector<char>& vA, const vector<size_t>& vOrder)  
{   
    assert(vA.size() == vOrder.size()); 
    for(int i = 0; i < vOrder.size(); ++i)
        if (i < vOrder[i])
            swap(vA[i], vA[vOrder[i]]);
}