How can I efficiently select a Standard Library co

2019-01-01 16:56发布

问题:

There\'s a well known image (cheat sheet) called \"C++ Container choice\". It\'s a flow chart to choose the best container for the wanted usage.

Does anybody know if there\'s already a C++11 version of it?

This is the previous one: \"eC++

回答1:

Not that I know of, however it can be done textually I guess. Also, the chart is slightly off, because list is not such a good container in general, and neither is forward_list. Both lists are very specialized containers for niche applications.

To build such a chart, you just need two simple guidelines:

  • Choose for semantics first
  • When several choices are available, go for the simplest

Worrying about performance is usually useless at first. The big O considerations only really kick in when you start handling a few thousands (or more) of items.

There are two big categories of containers:

  • Associative containers: they have a find operation
  • Simple Sequence containers

and then you can build several adapters on top of them: stack, queue, priority_queue. I will leave the adapters out here, they are sufficiently specialized to be recognizable.


Question 1: Associative ?

  • If you need to easily search by one key, then you need an associative container
  • If you need to have the elements sorted, then you need an ordered associative container
  • Otherwise, jump to the question 2.

Question 1.1: Ordered ?

  • If you do not need a specific order, use an unordered_ container, otherwise use its traditional ordered counterpart.

Question 1.2: Separate Key ?

  • If the key is separate from the value, use a map, otherwise use a set

Question 1.3: Duplicates ?

  • If you want to keep duplicates, use a multi, otherwise do not.

Example:

Suppose that I have several persons with a unique ID associated to them, and I would like to retrieve a person data from its ID as simply as possible.

  1. I want a find function, thus an associative container

    1.1. I couldn\'t care less about order, thus an unordered_ container

    1.2. My key (ID) is separate from the value it is associated with, thus a map

    1.3. The ID is unique, thus no duplicate should creep in.

The final answer is: std::unordered_map<ID, PersonData>.


Question 2: Memory stable ?

  • If the elements should be stable in memory (ie, they should not move around when the container itself is modified), then use some list
  • Otherwise, jump to question 3.

Question 2.1: Which ?

  • Settle for a list; a forward_list is only useful for lesser memory footprint.

Question 3: Dynamically sized ?

  • If the container has a known size (at compilation time), and this size will not be altered during the course of the program, and the elements are default constructible or you can provide a full initialization list (using the { ... } syntax), then use an array. It replaces the traditional C-array, but with convenient functions.
  • Otherwise, jump to question 4.

Question 4: Double-ended ?

  • If you wish to be able to remove items from both the front and back, then use a deque, otherwise use a vector.

You will note that, by default, unless you need an associative container, your choice will be a vector. It turns out it is also Sutter and Stroustrup\'s recommendation.



回答2:

I like Matthieu\'s answer, but I\'m going to restate the flowchart as this:

When to NOT use std::vector

By default, if you need a container of stuff, use std::vector. Thus, every other container is only justified by providing some functionality alternative to std::vector.

Constructors

std::vector requires that its contents are move-constructible, since it needs to be able to shuffle the items around. This is not a terrible burden to place on the contents (note that default constructors are not required, thanks to emplace and so forth). However, most of the other containers don\'t require any particular constructor (again, thanks to emplace). So if you have an object where you absolutely cannot implement a move constructor, then you will have to pick something else.

A std::deque would be the general replacement, having many of the properties of std::vector, but you can only insert at either ends of the deque. Inserts in the middle require moving. A std::list places no requirement on its contents.

Needs Bools

std::vector<bool> is... not. Well, it is standard. But it\'s not a vector in the usual sense, as operations that std::vector normally allows are forbidden. And it most certainly does not contain bools.

Therefore, if you need real vector behavior from a container of bools, you\'re not going to get it from std::vector<bool>. So you\'ll have to make due with a std::deque<bool>.

Searching

If you need to find elements in a container, and the search tag can\'t just be an index, then you may need to abandon std::vector in favor of set and map. Note the key word \"may\"; a sorted std::vector is sometimes a reasonable alternative. Or Boost.Container\'s flat_set/map, which implements a sorted std::vector.

There are now four variations of these, each with their own needs.

  • Use a map when the search tag is not the same thing as the item you\'re looking for itself. Otherwise use a set.
  • Use unordered when you have a lot of items in the container and search performance absolutely needs to be O(1), rather than O(logn).
  • Use multi if you need multiple items to have the same search tag.

Ordering

If you need a container of items to always be sorted based on a particular comparison operation, you can use a set. Or a multi_set if you need multiple items to have the same value.

Or you can use a sorted std::vector, but you\'ll have to keep it sorted.

Stability

When iterators and references are invalidated is sometimes a concern. If you need a list of items, such that you have iterators/pointers to those items in various other places, then std::vector\'s approach to invalidation may not be appropriate. Any insertion operation may cause invalidation, depending on the current size and capacity.

std::list offers a firm guarantee: an iterator and its associated references/pointers are only invalidated when the item itself is removed from the container. std::forward_list is there if memory is a serious concern.

If that\'s too strong a guarantee, std::deque offers a weaker but useful guarantee. Invalidation results from insertions in the middle, but insertions at the head or tail causes only invalidation of iterators, not pointers/references to items in the container.

Insertion Performance

std::vector only provides cheap insertion at the end (and even then, it becomes expensive if you blow capacity).

std::list is expensive in terms of performance (each newly inserted item costs a memory allocation), but it is consistent. It also offers the occasionally indispensable ability to shuffle items around for virtually no performance cost, as well as to trade items with other std::list containers of the same type at no loss of performance. If you need to shuffle things around a lot, use std::list.

std::deque provides constant-time insertion/removal at the head and tail, but insertion in the middle can be fairly expensive. So if you need to add/remove things from the front as well as the back, std::deque might be what you need.

It should be noted that, thanks to move semantics, std::vector insertion performance may not be as bad as it used to be. Some implementations implemented a form of move semantic-based item copying (the so-called \"swaptimization\"), but now that moving is part of the language, it\'s mandated by the standard.

No Dynamic Allocations

std::array is a fine container if you want the fewest possible dynamic allocations. It\'s just a wrapper around a C-array; this means that its size must be known at compile-time. If you can live with that, then use std::array.

That being said, using std::vector and reserveing a size would work just as well for a bounded std::vector. This way, the actual size can vary, and you only get one memory allocation (unless you blow the capacity).



回答3:

Here is the C++11 version of the above flowchart. [originally posted without attribution to its original author, Mikael Persson]

\"\"



回答4:

Here\'s a quick spin, although it probably needs work

Should the container let you manage the order of the elements?
Yes:
  Will the container contain always exactly the same number of elements? 
  Yes:
    Does the container need a fast move operator?
    Yes: std::vector
    No: std::array
  No:
    Do you absolutely need stable iterators? (be certain!)
    Yes: boost::stable_vector (as a last case fallback, std::list)
    No: 
      Do inserts happen only at the ends?
      Yes: std::deque
      No: std::vector
No: 
  Are keys associated with Values?
  Yes:
    Do the keys need to be sorted?
    Yes: 
      Are there more than one value per key?
      Yes: boost::flat_map (as a last case fallback, std::map)
      No: boost::flat_multimap (as a last case fallback, std::map)
    No:
      Are there more than one value per key?
      Yes: std::unordered_multimap
      No: std::unordered_map
  No:
    Are elements read then removed in a certain order?
    Yes:
      Order is:
      Ordered by element: std::priority_queue
      First in First out: std::queue
      First in Last out: std::stack
      Other: Custom based on std::vector????? 
    No:
      Should the elements be sorted by value?
      Yes: boost::flat_set
      No: std::vector

You may notice that this differs wildly from the C++03 version, primarily due to the fact that I really do not like linked nodes. The linked node containers can usually be beat in performance by a non-linked container, except in a few rare situations. If you don\'t know what those situations are, and have access to boost, don\'t use linked node containers. (std::list, std::slist, std::map, std::multimap, std::set, std::multiset). This list focuses mostly on small and middle sided containers, because (A) that\'s 99.99% of what we deal with in code, and (B) Large numbers of elements need custom algorithms, not different containers.