Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with
sequenced
andhashed_unique
indexes is the same data structure asLinkedHashSet
.Failing that, keep an
unordered_set
(orhash_set
, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.The problems with what you're currently doing (
set
andvector
) are:mutable
data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).LinkedHashSet
, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.set
has different performance characteristics from a hash set.If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
The way you described your combination of
std::set
andstd::vector
sounds like what you should be doing, except by usingstd::unordered_set
(equivalent to Java's HashSet) andstd::list
(doubly-linked list). You could also usestd::unordered_map
to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.
To replicate LinkedHashSet from JAVA in C++, I think you will need two vanilla
std::map
(please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.When you are going to insert, you use
std::map::find
in the firststd::map
to make sure that there is no identical object exists in it.std::map
I mentioned before.When you are going to iterate through this by order of insertion, you iterate through the second
std::map
since it will be sorted by insertion order (anything that falls into thestd::map
orstd::set
will be sorted automatically).When you are going to remove an element from it, you use
std::map::find
to get the order of insertion. Using this order of insertion to remove the element from the secondstd::map
and remove the object from the first one.Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int). In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first
std::map
withstd::unordered_map
to improve efficiency, you won't be able to replace the second one with it though. The reason is thatstd::unordered map
uses a hash code for indexing so that the index cannot be reliably sorted in this situation.You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.