For simplicity, I'll stick to vector<int>
but I think this applies to any vector<T>
object.
If I am using a vector<int>::iterator
to keep track of some position in a vector of int's and then I use vector<int>::push_back()
, the iterator becomes worthless. Meaning, I can't take its address with &
or dereference it. The direct reason made sense once I printed the address of some of the objects in the following sense:
vector<int> my_vec(1); //my_vec[0] = 0
vector<int>::iterator it = my_vec.begin(); //it -> my_vec[0], *it = my_vec[0] = 0
cout << "&my_vec = " << &my_vec << "\n";
cout << "&my_vec[0] = " << &my_vec[0] << "\n";
cout << "&it = " << &it << "\n"; //prints address, all good
cout << "*it = " << *it << "\n"; //prints 0, all good
cout << "\n\n" << pushing back..." << "\n\n";
my_vec.push_back(1);
cout << "&my_vec = " << &my_vec << "\n"; //same as before push_back()!
cout << "&my_vec[0] = " << &my_vec[0] << "\n"; //different from before push_back()!!
cout << "&it = " << &it << "\n"; //same as before push_back()
//cannot do &it or *it
So obviously the address of it
doesn't change but push_back()
has moved things around in memory and now the address of the different "elements" of my_vec
are changed. The fact that my_vec[i] has a new address made sense to me but then I have the following questions:
1) Why doesn't the address of my_vec
change? It seems that if push_back()
causes the addresses of the my_vec[i]
to change, it should also change the address of the whole object. For an array, my_array
is a pointer to my_array[0]
so I can imagine an operation changing the addresses of each my_array[i]
and updating the pointer to point to the new address of my_array[0]
but the address of the pointer my_array
as an object in and of itself wouldn't change. But my_vec
is not a pointer in any sense to my_vec[0]
so I am confused why the addresses of the my_vec[i]
would change but not the object my_vec
.
2) Why would any operation internal to vector<int>
that changes the address of the my_vec[i]
(such as push_back()
) not also properly "update" any iterators? This seems like a good idea? No?
3) Given that #2 is what it is and my iterators become worthless when I call push_back()
, what is the correct way to deal with this? Should I not use iterators if I need to use push_back()
? If someone is going to complain about what my use-case is for using iterators and push_back()
, I excluded it for brevity but it was basically implementing a stack using vector<int>
and I was using an iterator to keep track of the top of the stack. Since I didn't want a fixed size to start, I tried to use push_back()
to enlarge the stack when my iterator hit my_vec.end()
. But I think this is a valid question in general.
Thanks very much for your help!
Why doesn't the address of my_vec
change?
Because the vector object itself is still the same object at the same address. Reallocation changes the address of the dynamic array that it manages, not the vector object that manages the array.
Why would any operation internal to vector<int>
that changes the address of the my_vec[i]
(such as push_back()
) not also properly "update" any iterators? This seems like a good idea? No?
That would have a (perhaps large) runtime cost. Either the vector would have to track all the iterators (requiring dynamic storage, memory allocation each time an iterator is created, and updating of all vectors when the vector changes) or each iterator would need a reference to the container, checked on every access, and could not be implemented as a simple pointer. C++ generally avoids runtime costs where possible - especially in cases like this, where they are nearly always unnecessary.
what is the correct way to deal with this?
There are various options. You could store an index rather than an iterator. You could use a container such as std::list
with stable iterators (although that might be rather less efficient). If you can place an upper bound on the size of the array, you could reserve that amount so that reallocation won't be necessary. You could write your own container (or adapter) that automatically updates iterators. For your stack, if it is indeed a stack, you shouldn't need to track anything other than the end of the vector, so there's no need to store an iterator at all. Or you could use std::stack
rather than reinventing it.
Are there any other vector<T>
member functions (besides the obvious ones) that have this effect on iterators?
Iterators are invalidated by reallocation, when any operation causes the vector to grow beyond its current capacity. You can control the capacity with the reserve
function.
Also, erasing elements will invalidate any iterator referring to the erased elements, or elements later in the sequence.
Why doesn't the address of my_vec change?
A reallocation of memory in a std::vector
happens to its internal buffer that holds its elements and not the object itself (i.e., not the std::vector
itself). Consider the following toy example:
template<typename T>
class vector {
T *buffer;
...
public:
...
};
If I define a vector
object (e.g., vector<int> v
), the object v
has an address. When a reallocation happens due to an insertion or an erasure it's not the address of v
that changes but rather the value of its member variable buffer
(i.e., buffer
would point to a new address location).
Why would any operation internal to vector that changes the address of the my_vec[i] (such as push_back()) not also properly "update" any iterators?
The issue of iterator invalidation is well known and you should take your precautions when you have to deal with such situations.
Are there any other vector member functions (besides the obvious ones) that have this effect on iterators? Does it depend on T?
Yes there are (e.g., std::vector::insert
, std::vector::erase
). No it doesn't depend on T
(i.e., the type of the std::vector
's element).