Does insertion of elements in a vector damages a p

2019-02-17 11:46发布

问题:

In a program to simulate logic gates I switched from using arrays

node N[1000];

to vectors

vector<node> N;

And my program did work perfectly before using vectors but now it prints wrong results, so I tried debugging and I found out that the bug happens here:

node* Simulator::FindNode(string h)
{
    int i;
    for(i = 0; i < NNodes; i++)
    {
        if (N[i].getname() == h)
        {
            return &N[i];
        }
    }

    node n ;
    N.push_back(n);
    N[NNodes].setname(h);
    NNodes++;
    return &N[NNodes-1]; //why?because of NNodes++  
}

// ...

node* inp1;
node* inp2;
node* out;
string NodeName;

inp_file >> NodeName;
inp1 = FindNode(NodeName);
s1 = inp1;

inp_file >> NodeName;
inp2 = FindNode(NodeName); //inp1 is destroyed here 

inp_file >> NodeName;
out = FindNode(NodeName); //inp2 and inp1 are destroyed here 

When calling FindNode for the 1st time, the 1st pointer inp1 points to the right place which is &N[0].

When calling FindNode for the second time the 1st pointer inp1 points to rubbish and the second pointer inp2 points to the right place &N[1].

When calling FindNode for the 3rd time the both the 1st and 2nd pointers (inp1, inp2) point to rubbish! And 3rd pointer out points to the right place.

Why would that happen?
How does vector work when I insert items to them and which kind of pointers should I use to point to vectors items?

回答1:

A few things.

First, as far as I can tell NNodes is just tracking the size. But you have std::vector::size() for that. You then use it to get the last inserted element, but you can just use std::vector::back() for that: return &N.back();.

Also your parameter is being passed by value, when it should probably be passed by const-reference: const string& h. This avoids unnecessary copies, and in general* you should pass things by const-reference instead of by-value.

And this is bad:

node n;
N.push_back(n);
N[NNodes].setname(h);

node should probably have a constructor that takes a const string& and sets the name during initialization. That way you can never have a node without a name, as in:

node n(h);
N.push_back(n);

Or more terse:

N.push_back(node(h));

Much better.

Second, yes, vector can invalidate pointers to elements; namely, whenever the capacity of the vector needs to be increased. If you can, reserve() the capacity up front to avoid re-allocations. In your case you cannot, so you can go two different routes.

The first route is a level of indirection. Instead of pointing directly at things, get their index into the array. Note that while their address may change, their location within the vector will not. You would have Simulator::FindNode return a size_t, and return N.size() - 1. Add a member like node& GetNode(size_t index), which just does return N[index]; (will error checking if you wish). Now whenever you need a member, hand the index to that member to GetNode and you'll get a reference to that node back.

The other route is to change your container. You can use a deque, for example. This does not have contiguous storage, but it's much like vector. push_back and pop_back are still O(1), and it still has good cache-coherence. (And by the way, deque trades contiguous storage for the ability to push_front and pop_front in O(1) time as well)

The important thing is that deque will not invalidate pointers during a push or pop operation from either end. It works by a sort of vector-list hybrid, where you get chunks of storage for elements linked together. Change your underlying storage to deque (and don't take or put anything in the middle), and you can point to things just fine.

However, from what I can tell you have a terribly inefficient map. You're mapping names to nodes. You should probably just use std::map, which has the exact interface you're trying to recreate. You can even point to any element in a map, which never invalidates things.

*The rule is, pass by const-reference unless the type is primitive (built-in like int, double, etc.), if the types size is less than sizeof(void*), or if you are going to need a copy of it anyway.

That is, don't do this:

void foo(const std::string& s)
{
    std::string ss(s); // make a copy, use copy
}

But do this:

void foo(std::string s) // make a copy, use copy
{
}



回答2:

Yes, it can re-allocate the entire buffer, making all pointers into the old location invalid.

You can limit this by preallocating, but that's really just a performance boost. The better way is to use indexes instead of raw pointers.



回答3:

When a vector grows, it is reallocated, which effectively invalidates all pointers to elements of the vector.

If you know beforehand how many elements you will have in the vector, you could use the reserve() method to preallocate space.



回答4:

Returning a pointer to an internal STL member is probably not the best idea. When you give an object to an STL container you are basically giving up control of it. You are telling the STL it can move it around as it sees fit to maintain the promises that the container gives you. Returning the index where the node is located is a better idea like Steven Sudit mentioned.

Once you get the index you could create a function that returns a copy of the contents of the node you are interested in. This way you also maintain data encapsulation with the STL container, not allowing anyone else to modify its contents.



回答5:

Yes, inserts will invalidate old pointers to vector elements on reallocation. If you want to use stable pointers really hard, you can switch from vector to deque. It offers a very similar interface to vector and can grow without reallocating and moves previous contents by allocating further chunks.

The price you pay for using a deque instead of a vector is one more level of indirection on random access. Depending on your usage, that may be totally irrelevant. You should iterate over deque the whole deque if necessary by using iterators. That will be as fast a iterating over a vector.

The gain is: zero reallocations!



回答6:

Imagine you needed to write the array-based code so that it could possibly re-size the array if necessary.

Imagine how you would do it.

Imagine what would happen to stale pointers.

Rewrite your code to use indices rather than pointers, or to guarantee that re-allocation doesn't happen, as appropriate.