Sort elements, but keep certain ones fixed

2019-04-25 13:52发布

问题:

The function

template <typename Container, typename Comparator, typename Predicate>
void sortButKeepSomeFixed (Container& c, const Comparator& comp, const Predicate& pred)

is to sort the container c according to the ordering criterion comp, but those elements that satisfy pred shall remain fixed in their original positions after the sort (i.e. unaffected by the sort).

I tried to adapt quick sort to fit this, but could not think of it. In the end, I decided to adapt the crude selection sort to get the job done:

#include <iostream>
#include <vector>

std::vector<int> numbers = {5,7,1,8,9,3,20,2,11};

template <typename Container, typename Comparator, typename Predicate>
void sortButKeepSomeFixed (Container& c, const Comparator& comp, const Predicate& pred) {  // O(n^2), but want O(nlogn) on average (like quick sort or merge sort)
    const std::size_t N = c.size();
    std::size_t i, j, minIndex;
    for (i = 0; i < N-1; i++) {
        if (pred(c[i]))
            continue;  // c[i] shall not swap with any element.
        minIndex = i;
        for (j = i + 1; j < N; j++) {
            if (pred(c[j]))
                continue;  // c[j] shall not swap with any element.
            if (comp(c[j], c[minIndex]))
                minIndex = j;
        }
        if (minIndex != i)
            std::swap(c[i], c[minIndex]);
    }
}

int main() {
    sortButKeepSomeFixed (numbers,
        std::greater<int>(),  // Ordering condition.
        [](int x) {return x % 2 == 0;});  // Those that shall remain fixed.
    for (int x : numbers) std::cout << x << ' ';  // 11 9 7 8 5 3 20 2 1
}

But the time complexity is O(N^2) (I think). Can someone improve on the time complexity here, to perhaps O(NlogN) on average? In other words, find an overall better algorithm, using recursion or something like that?

Or perhaps a better idea is to take out the elements that satisfy pred, sort what left with std::sort and then put the extracted elements back in their original positions? Would that be any more efficient, or would that just make it worse?

Update: This is based on Beta's suggestion (sorting the iterators that don't pass pred). But though the elements that pass pred do indeed remain fixed, the sorting at the end is not correct.

template <typename Container, typename Comparator, typename Predicate>
void sortButKeepSomeFixed (Container& c, const Comparator& comp, const Predicate& pred) {
    std::vector<typename Container::iterator> iterators;
    for (typename Container::iterator it = c.begin();  it != c.end();  ++it) {
        if (!pred(*it))
            iterators.emplace_back(it);
    }
    std::vector<typename Container::iterator> originalIterators = iterators;
    std::sort(iterators.begin(), iterators.end(),
        [comp](const typename Container::iterator& x, const typename Container::iterator& y)
        {return comp(*x, *y);});
    for (int i = 0; i < originalIterators.size(); i++)
        *originalIterators[i] = *iterators[i];
}

The incorrect output is 11 9 9 8 11 3 20 2 9 when it should be 11 9 7 8 5 3 20 2 1.

回答1:

That's a fun one. I first tried to code the IMO correct approach, using a custom iterator that just skips elements that satisfy the predicate. This turned out to be quite challenging, at least writing that on a mobile phone as I'm doing it.

Basically, this should lead to code similar to what you can find in Eric Niebler's ranges v3.

But there's also the simpler, direct approach that you're trying to use above. The problem of your non working solution is, that it's changing the values the (rest of the sorted) iterators point to when assigning in that last for loop. This issue can be avoided by having a copy, like in my code:

int main(int, char **) {
 vector<int> input {1,2,3,4,5,6,7,8,9};
 vector<reference_wrapper<int>> filtered{begin(input), end(input)};
 filtered.erase(remove_if(begin(filtered), end(filtered),
         [](auto e) {return e%2==0;}), end(filtered));
 vector<int> sorted{begin(filtered), end(filtered)};
 // change that to contain reference wrappers to see the issue
 sort(begin(sorted), end(sorted),
      greater<int>{});
 transform(begin(filtered), end(filtered),
    begin(sorted),
    begin(filtered),
    [](auto to, auto from) {
      to.get() = from; return to;});
 copy(begin(input), end(input),
      ostream_iterator<int>{cout, ", "});
 return 0;
}

Live example here. Forgot to fork before modifying, sorry.

(Instead of using copies at last for types that are using heap allocated data move should probably be used. Though I'm not sure whether you can assign to a moved from object.)

Using a ... rather weird ... wrapper class instead of the std::reference_wrapper makes it possible to achieve the filtered sorting without having to use a vector with (copied or moved) elements of the value type:

template <class T>
class copyable_ref {
public:
  copyable_ref(T& ref) noexcept
  : _ptr(std::addressof(ref)), _copied(false) {}
  copyable_ref(T&&) = delete;
  copyable_ref(const copyable_ref& x) noexcept
  : _ptr (new int(*x._ptr)), _copied (true) {
  }
  ~copyable_ref() {
    if (_copied) {
      delete _ptr;
    }
  }
  copyable_ref& operator=(const copyable_ref& x) noexcept {
    *_ptr = *x._ptr;
  }
  operator T& () const noexcept { return *_ptr; }
  T& get() const noexcept { return *_ptr; }
private:
  T* _ptr;
  bool _copied;
};

Upon construction this class stores a pointer to it's argument, which is also modified when the copy assignment operator is used. But when an instance is copy constructed, then a heap allocated copy of the referenced (by the other) value is made. This way, it's possible to swap two referenced values with code similar to

Value a, b;
copyable_ref<Value> ref_a{a}, ref_b{b};
copyable_ref<Value> temp{ref_a};
ref_a = ref_b;
ref_b = temp;
// a and b are swapped

This was necessary because std::sort doesn't seem to use swap (found through ADL or std::swap) but code equivalent to the one above.

Now it's possible to sort a filtered "view" by filling a vector with (not copy constructed) instances of the weird wrapper class and sorting that vector. As the output in the example is showing, there's at most one heap allocated copy of a value type. Not counting the needed size for the pointers inside of the wrapper, this class enables filtered sorting with constant space overhead:

 vector<int> input {1,2,3,4,5,6,7,8,9};

 vector<copyable_ref<int>> sorted;
 sorted.reserve(input.size());
 for (auto & e : input) {
    if (e % 2 != 0) {
      sorted.emplace_back(e);
    }
 }
 sort(begin(sorted), end(sorted),
      greater<int>{});
 copy(begin(input), end(input),
      ostream_iterator<int>{cout, ", "});
 cout << endl;
 // 9 2 7 4 5 6 3 8 1

Finally, while this works quite well, I probably wouldn't use this in production code. I was especially surprised that std::sort wasn't using my own swap implementation, which led to this adventurous copy constructor.


You cannot generalise your code to work for sets and maps: Those are sorted by design, and they need that fixed order to function properly. And the unordered variants are, well, unordered and thus cannot maintain an order. But you can always (as long as you don't modify the container) use std::reference_wrappers inside of a vector to provide a sorted "view" of your data.



回答2:

Based on Beta's idea to sort using iterators, though I'm not sure what the time-complexity is. It also does not work with all containers, e.g. std::set, std::map.

template <typename Container, typename Comparator, typename Predicate>
void sortButKeepSomeFixed (Container& c, const Comparator& comp, const Predicate& pred) {
    std::vector<typename Container::value_type> toSort;
    std::vector<typename Container::iterator> iterators;
    for (typename Container::iterator it = c.begin();  it != c.end();  ++it) {
        if (!pred(*it)) {
            toSort.emplace_back(*it);
            iterators.emplace_back(it);
        }
    }
    std::sort(toSort.begin(), toSort.end(), comp);
    for (std::size_t i = 0; i < toSort.size(); i++)
        *iterators[i] = toSort[i];
}

std::vector<int> vector   = {5,7,1,8,9,3,20,2,11};
std::array<int, 9> array = {5,7,1,8,9,3,20,2,11};
std::list<int> list       = {5,7,1,8,9,3,20,2,11};
std::set<int> set         = {5,7,1,8,9,3,20,2,11};
std::map<double, int> map = { {1.5,5}, {1.2,7}, {3.5,1}, {0.5,8}, {5.2,9}, {7.5,3}, {0.1,20}, {1.8,2}, {2.4,11} };

template <typename Container>
void test (Container& container) {
    sortButKeepSomeFixed (container,
        std::greater<int>(),  // Ordering condition.
        [](int x) {return x % 2 == 0;});  // Those that shall remain fixed.
    for (int x : container) std::cout << x << ' ';
    std::cout << '\n';
}

int main() {
    test(vector);  // 11 9 7 8 5 3 20 2 1
    test(array);  // 11 9 7 8 5 3 20 2 1
    test(list);  // 11 9 7 8 5 3 20 2 1
    test(set);  // Does not compile.
    sortButKeepSomeFixed (map,
        [](const std::pair<double, int>& x, const std::pair<double, int>& y) {return x.second > y.second;},
        [](const std::pair<double, int>& x) {return x.second % 2 == 0;});
    for (const std::pair<double, int>& x : map)
        std::cout << "(" << x.first << "," << x.second << ") ";  // Does not compile.
}

Error for set and map is "Assignment of read-only location". Anyone know how to generalize this to work with sets and maps?

Update: So I suggest for set, maps, etc..., simply remove those elements that satisfy pred and the create a new set/map/... with Compare as their key_compare type. Like below. But it is only for set. How to generalize it to other containers that have key_compare types?

template <typename Container, typename Comparator, typename Predicate>
std::set<typename Container::value_type, Comparator, typename Container::allocator_type>
        sortButRemoveSomeElements (Container& c, const Comparator&, const Predicate& pred) {
    std::set<typename Container::value_type, Comparator, typename Container::allocator_type> set;
    std::vector<typename Container::value_type> keep;
    for (typename Container::iterator it = c.begin();  it != c.end();  ++it) {
        if (!pred(*it))
            keep.emplace_back(*it);
    }
    for (typename Container::value_type x : keep)
        set.emplace(x);  // Sorted by Comparator automatically due to std::set's insertion property.
    return set;
}

The test:

struct GreaterThan { bool operator()(int x, int y) const {return x > y;} };
std::set<int, GreaterThan> newSet = sortButRemoveSomeElements (set,
    GreaterThan{},  // Ordering condition.
    [](int x) {return x % 2 == 0;});  // Those that shall be removed.
for (int x : newSet) std::cout << x << ' ';  // 11 9 7 5 3 1