Is vector::insert allowed to reserve only once and

2019-02-12 00:57发布

问题:

vector::insert(dst_iterator, src_begin, src_end) (insert a range) can be optimized for random-access iterators to reserve the required capacity src_end - src_begin first, then perform the copy.

The main question I have: Does the Standard also allow vector::insert to avoid a capacity check for each copied element? (I.e. not using push_back or similar on every element to be inserted)

I'll refer to avoiding this capacity check as "optimization of insert".


What could go wrong: I can imagine an iterator with side effects when dereferenced:

Note: the Standard guarantees the iterators passed to insert will be dereferenced exactly once (see end of question).

#include <vector>
#include <iterator>
#include <iostream>

template < typename T >
struct evil_iterator : std::iterator < std::random_access_iterator_tag, T >
{
    using base = std::iterator < std::random_access_iterator_tag, T >;

    std::vector<T>* evil_feedback;
    typename std::vector<T>::iterator innocent_iterator;

    evil_iterator( std::vector<T>* c,
                   typename std::vector<T>::iterator i )
        : evil_feedback{c}
        , innocent_iterator{i}
    {}

    void do_evil()
    {
        std::cout << "trying to do evil; ";
        std::cout << "cap: " << evil_feedback->capacity() << ", ";
        std::cout << "size: " << evil_feedback->size() << ", ";

        // better not invalidate the iterators of `*evil_feedback`
        // passed to the `insert` call (see example below)
        if( evil_feedback->capacity() > evil_feedback->size() )
        {
            evil_feedback->push_back( T{} );
            // capacity() might be == size() now
            std::cout << "successful >:]" << std::endl;
        }else
        {
            std::cout << "failed >:[" << std::endl;
        }
    }

    T& operator*()
    {
        do_evil();  // <----------------------------------------
        return *innocent_iterator;
    }


    // non-evil iterator member functions-----------------------

    evil_iterator& operator++()
    {
        ++innocent_iterator;
        return *this;
    }
    evil_iterator& operator++(int)
    {
        evil_iterator temp(*this);
        ++(*this);
        return temp;
    }


    evil_iterator& operator+=(typename base::difference_type p)
    {
        innocent_iterator += p;
        return *this;
    }
    evil_iterator& operator-=(typename base::difference_type p)
    {
        innocent_iterator -= p;
        return *this;
    }

    evil_iterator& operator=(evil_iterator const& other)
    {
        evil_feedback = other.evil_feedback;
        innocent_iterator = other.innocent_iterator;
        return *this;
    }

    evil_iterator operator+(typename base::difference_type p)
    {
        evil_iterator temp(*this);
        temp += p;
        return temp;
    }
    evil_iterator operator-(typename base::difference_type p)
    {
        evil_iterator temp(*this);
        temp -= p;
        return temp;
    }

    typename base::difference_type operator-(evil_iterator const& p)
    {
        return this->innocent_iterator - p.innocent_iterator;
    }

    bool operator!=(evil_iterator const& other) const
    {  return innocent_iterator != other.innocent_iterator;  }
};

Example:

int main()
{
    std::vector<int> src = {3, 4, 5, 6};
    std::vector<int> dst = {1, 2};

    evil_iterator<int> beg = {&dst, src.begin()};
    evil_iterator<int> end = {&dst, src.end()};

    // explicit call to reserve, see below
    dst.reserve( dst.size() + src.size() );
    // using dst.end()-1, which stays valid during `push_back`,
    //   thanks to Ben Voigt pointing this out
    dst.insert(dst.end()-1, beg, end);  // <--------------- doing evil?

    std::copy(dst.begin(), dst.end(), 
              std::ostream_iterator<int>{std::cout, ", "});
}

Questions:

  1. Can vector::insert be optimized to avoid a capacity check for each inserted element?
  2. Is evil_iterator still a valid iterator?
  3. If so, is evil_iterator evil, i.e. can it result in UB / non-complying behaviour if insert is optimized as described above?

Maybe my do_evil is not evil enough.. have no problems on clang++ 3.2 (using libstdc++):

Edit 2: Added the call to reserve. Now, I'm doing evil :)

trying to do evil; cap: 6, size: 2, successful >:]
trying to do evil; cap: 6, size: 3, successful >:]
trying to do evil; cap: 6, size: 4, successful >:]
trying to do evil; cap: 6, size: 9, failed >:[
1, 3, 4, 5, 6, 0, 0, 135097, 2,

Edit: Why I think the optimization could break this:

  1. Consider dst.size() == dst.capacity() == 2 at the beginning.
  2. The call to insert requires a new capacity of 6.
  3. The optimization enlarges the capacity to exactly 6, then begins to insert the elements by copying from the src iterators (beg, end).
  4. This copying is done within a loop where no capacity checks occur. (That is the optimization.)
  5. During the process of copying, further elements are added to the vector (w/o invalidating the iterators), in do_evil. The capacity now is not sufficient any more to hold the rest of the elements to be copied.

Maybe you had to use reserve in the example explicitly to force updating the observable capacity before using do_evil. Currently, insert could reserve some capacity but change what capacity returns (i.e. observable capacity) only after the copying is done.


What I've found in the Standard so far seems to allow the optimization of insert:

[sequence.reqmts]/3

a.insert(p,i,j) [...]

Requires: T shall be EmplaceConstructible into X from *i.

For vector, if the iterator does not meet the forward iterator requirements (24.2.5), T shall also be MoveInsertable into X and MoveAssignable. Each iterator in the range [i,j) shall be dereferenced exactly once.

pre: i and j are not iterators into a. Inserts copies of elements in [i, j) before p

[vector.modifiers] on insert

1 Remarks: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens, all the iterators and references before the insertion point remain valid. If an exception is thrown other than by the copy constructor, move constructor, assignment operator, or move assignment operator of T or by any InputIterator operation there are no effects. If an exception is thrown by the move constructor of a non-CopyInsertable T, the effects are unspecified.

2 Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.

回答1:

Looking again, I think this rule (section 17.6.4.9) is a clearer prohibition on what you tried to do:

Each of the following applies to all arguments to functions defined in the C++ standard library, unless explicitly stated otherwise.

  • If an argument to a function has an invalid value (such as a value outside the domain of the function or a pointer invalid for its intended use), the behavior is undefined.

I think this rule applies during the entire duration of the function call, and not only at function entry.

Furthermore, push_back() guarantees that (23.3.7.5):

If no reallocation happens, all the iterators and references before the insertion point remain valid.

Your position passed to insert, which is dst.end() as evaluated before the insert call, is not before the insertion point of the first evil_feedback->push_back() call, so it does not remain valid (the fact that you carefully avoided reallocation here does not save you, as you only met half the condition). Which means the argument you passed to std::vector::insert, a function defined in the C++ Standard Library, is invalid during the duration of that call, landing you squarely in the realm of undefined behavior.


Previous answer:

I think you violated this precondition that you quoted:

pre: i and j are not iterators into a.



回答2:

(Note: This is more of a comment, I'm using an answer to allow formatting and longer content. Marking CW because comments should not receive rep)

I believe this is a correct algorithm that avoids O(NM) complexity, if input iterators are random-access:

  1. Determine the size of the range to insert (only possible for random-access iterators).
  2. Reserve additional space.
  3. Adjust size.
  4. Move-construct the new tail elements.
  5. Move-assign the other intervening elements toward the new end.
  6. Copy the source elements into the range left empty by the move.


回答3:

Here are my views:

  1. Yes; de-referencing can have side effects on your vector (case in point) which could lead to undefined behavior in some cases, but this should not be the case with standard conforming iterators.
  2. No; Iterator's are intended as a generalization of pointers - since pointers de-referencing may not have side effects (cannot find reference) the same should be the case for iterators [iterator.requirements.general]. Given this interpretation "optimization of insert" (1) is valid.