C++: vector of pointer loses the reference after p

2019-04-04 12:43发布

问题:

In my code a have a global vector of Node object and a local vector of Node pointers:

#include<cstdio>
#include<cstdlib>
#include<vector>

using namespace std;

class Node {
    int n;

public:
    Node(int i) : n(i);
    int getN() { return n; }
};

vector<Node> v;

int main() {
    vector<Node*> p;
    v.push_back(Node(1));
    p.push_back(&v[0]);
    printf("first node id : %d\n", (*p[0]).getN());

    return 0;
}

I inserted a node object to global vector & inserted the pointer of that object in the local vector. Output of my above code is:

first node id : 1

However, if I change my main function to this:

int main()
{
    vector<Node*> p;
    v.push_back(Node(1));
    p.push_back(&v[0]);
    v.push_back(Node(2));
    p.push_back(&v[1]);
    printf("first node id : %d\n", (*p[0]).getN());

    return 0;
}

The code prints a garbage value:

first node id : 32390176

I can't figure out the problem. Does the vector data structure changes the references of each object after an insertion ? How can I fix this ?

回答1:

"Does a vector change the references after an insertion?"

Possibly, yes. An std::vector may reallocate its (heap) storage when you add/push_back() additional elements, invalidating all pointers:

Iterator [read: Pointer] Invalidation

(for operations) push_back, emplace_back ... If the vector changed capacity, all of them [i.e. all iterators are invalidated]. If not, only end().

"How can I fix this?"

The above invalidation rule does not apply if a vector's capacity does not change due to an insertion - since vectors do not reallocate storage unnecessarily. So if you pre-set your vector's capacity to 2 in your example (say, with v.reserve(2)), the pointer will remain valid. If you don't know the size in advance, but you can delay the construction of the second vector (with the pointers), you don't have to reserve, you'll just have the size after inserting the last element.

The approaches above are highly unrecommended, however. If you were to make your vector constant - at least in the scope of a function in which you would construct and use the second vector - you would have a strong guarantee of non-reallocation. Alternatively, if you could determine the size in advance you might use an std::array, and it would be more fitting to use pointers into that container's storage:

Iterator Invalidation

As a rule, iterators to an array are never invalidated throughout the lifetime of the array.

You might also consider storing indices into your vector (although there, as well, the vector might shrink, invalidating the indices, or you might insert elements in the middle etc).

Anyway, I suspect that you might not actually want to do any of this, i.e. it seems to be a not-so-good solution to a problem which could be handled with a different approach altogether.

PS - If the vector has a custom allocator then everything I've written might be irrelevant.



回答2:

What you are doing is undefined behavior for your vector p because the vector v can change where it's objects are stored.

A std::vector's memory is contiguous, so it may, after a number of push_backs, have to allocate a new block memory and copy it's contents to the new block. This will invalidate all the pointers that happened to point to the old memory location.



回答3:

Yes, a push_back() on a vector invalidates all references (and pointers) to elements in that vector if it has to reallocate. There are various ways to work around this. If you know that your vector will have a particular number of nodes, you can use reserve(). In your example, you could reserve two elements:

int main()
{
    v.reserve(2);
    .
    .
    .
}

This will make sure the vector has preallocated enough storage so that it doesn't need to reallocate.

If you don't know the size ahead of time, then you'll have to change something about your approach. You might use a std::deque instead of a std::vector, since std::deque doesn't invalidate references when you use push_back(). You might store indices instead of pointers. Or you might need to push all the nodes into your vector before making the pointers.

int main()
{
    v.push_back(Node(1));
    v.push_back(Node(2));

    vector<Node*> p;
    p.push_back(&v[0]);
    p.push_back(&v[1]);

    printf("first node id : %d\n", (*p[0]).getN());

    return 0;
}


回答4:

You have stumbled upon one of the great known "dark corners" of C++: the great "iterator invalidation." Definitely familiarize yourself intimately with this:

Iterator invalidation rules

In particular, you are hitting this reality:

vector: all iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated) [23.2.4.3/1]

(the emphasis is mine)

Now, about your issue. You can either make sure the vector never re-allocates. Or, you can use a different container that does not have this issue. There are compromises among all the container types, depending on your needs. Check out the other question thoroughly and make an informed decision.