C++0x issue: Constant time insertion into std::set

2020-07-05 05:25发布

问题:

According to this page, I can achieve constant time insertion if I use

iterator std::set::insert ( iterator position, const value_type& x );

and the position iterator I provide directly "precedes" the proper (in-order) insertion point.

Now the case I'm concerned with is if I know that the value I'm inserting goes at the end (since it's the largest), e.g.:

set<int> foo = {1, 2, 3};
foo.insert(4); // this is an inefficient insert

According to the above criterion I should pass the last element foo.end()-1 to insert not foo.end(). Is my understanding correct? What happens if I pass foo.end()? Will it be a O(log n) insertion or a O(1) one. So, the options are:

// Option A
foo.insert(foo.end()-1, 4);

// Option B
foo.insert(foo.end(), 4);

// Safer version of Option A
if(foo.empty())
    foo.insert(4);
else
    foo.insert(foo.end()-1, 4);

I ask because I'm writing a function that's templated on the container. I want to insert an element (that I know is the largest) to the end of whatever container is passed in. Using "Option A" above has a different behavior for a container like vector:

foo.insert(foo.end()-1, 4);
// result is {1, 2, 3, 4} if foo is an std::set
// result is {1, 2, 4, 3} if foo is an std::vector

As @Bo_Persson suggests, the problem here is that C++03 says "logarithmic in general, but amortized constant if t is inserted right after p." while C++0x says "logarithmic in general, but amortized constant if t is inserted right before p."

PS: I'm using GCC 4.5 on Ubuntu 11.04 with C++0x support enabled.

Edit: I ran empirical tests with C++0x support enabled and disabled and posted the results in an answer. Basically, the conclusion is that it's just as good (and is obviously safer) to provide end() as the insertion hint. However, that's obviously just an empirical observation. The standard, as stated, is still confusing on this aspect.

回答1:

Is it cheating to run a test instead of reading through library specifications?

For g++-4.4 -O2 for the integers 0 <= i < 5000000 my running times for standard insertion are

real    0m14.952s
user    0m14.665s
sys 0m0.268s

and my running times for insertion using end() as hint are

real    0m4.373s
user    0m4.148s
sys 0m0.224s

Insertion at end() - 1 is just as fast as far as I can tell, but it is more cumbersome to use because end() - 1 is an illegal operation (you have to use operator--()) and it crashes if the set happens to be empty.

#include <set>

typedef std::set<int> Set;

void insert_standard(Set& xs, int x)
{
    xs.insert(x);
}

void insert_hint_end(Set& xs, int x)
{
    xs.insert(xs.end(), x);
}

int main()
{
    const int cnt = 5000000;
    Set xs;
    for (int i = 0; i < cnt; i++) {
        // insert_hint_end(xs, i);
        insert_standard(xs, i);
    }
}


回答2:

It is not totally clear if the position should be pointing before or after the insertion point. Some implementations work with either.

On the other hand, if you want different behavior for different containers, why don't you just write two overloads for your function, one for containers with a push_back function and one for std::set.



回答3:

Only supplying an iterator that falls immediately after the new value makes any sense.

That's because in a collection of N elements, there are N+1 possible insertion points. An iterator exists that comes after a value higher than any preexisting element, but there is no iterator that comes before a value before all elements.



回答4:

Following in the footsteps of @antonakos, I'm expanding on the "cheating" solution and running an empirical test. I'm using GCC 4.5 with optimization (-02) and considering both the case when C++0x support is not enabled and when it is with -std=c++0x. Results on 40,000,000 insertions are as follows (showing system time as the other values in this case are not special):

  • Without C++0x support
    • No hint: 26.6 seconds
    • Hint at end(): 5.71 seconds
    • Hint at --end(): 5.84 seconds
  • With C++0x support enabled
    • No hint: 29.2 seconds
    • Hint at end(): 5.34 seconds
    • Hint at --end(): 5.54 seconds

Conclusion: GCC (with or without C++0x enabled) inserts efficiently when end() is provided as the insertion hint.

The code I used is based on @antonakos's:

#include <set>
typedef std::set<int> Set;

void insert_standard(Set & xs, int x) {
    xs.insert(x);
}

void insert_hint_end(Set & xs, int x) {
    xs.insert(xs.end(), x);
}

void insert_hint_one_before_end(Set & xs, int x) {
    xs.insert(--xs.end(), x);
}

int main() {
    const int cnt = 40000000;
    Set xs;
    xs.insert(0);
    for (int i = 1; i < cnt; i++) {
        //insert_standard(xs, i);
        //insert_hint_one_before_end(xs, i);
        insert_hint_end(xs, i);
    }

    return 0;
}