Should my std::vector contain pointers or structs?

2019-03-26 13:49发布

站内文章 / C++

20 0

该账号已被封号

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I know that holding pointers incurs the overhead of an extra dereference operation but it saves me including the (potentially large) header file that contains the definition of my struct.

However my preference is to be determined by the advantage of having a std::vector<myStruct> *ptr2Vect member. Namely, not having to call delete on each element. How big a performance advantage is this? Can vector really allocate objects on the stack? I am fairly new to template classes and wonder if it could be possible for a dynamic array to expand on the stack and at what price?

_EDIT_

I fail in understanding default copy constructor and operator= members and am trying to keep things as simplistic structs. I have neither implementation defined explicitly so fear that making the vector element an object instead of pointer will create temporary object at assignment time that will be destructed and so ruin its copy.

_EDIT_

Sorry for the delay in delivering pertinent information (I am shy with my code).

I want to call push_back(newObj). Now if I don't use pointers I have a big problem in that I don't want to perform a deep copy but my dtor will free up the memory shared by the LHS and RHS of this invocation of the copy constructor.

回答1:

As a general rule of thumb I'd say you probably don't want to put pointers in your containers, unless there's a good reason.

Possible reasons to consider pointers:

You have virtual functions
You have a class hierarchy
You don't know the size of the objects where you're using them this. (You can only use pointers or references in that case and you can't have a vector of references)
Your objects are exceedingly large (probably benchmark this)

The biggest reason not to put pointers in containers would be that it makes it much easier not to make a mistake and accidentally leak memory. This is especially true when you start to consider exceptions.

Not having pointers in your containers makes it much easier to use STL <algorithms>, consider:

#include <vector>
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>

int main() {
  std::vector<std::string> test;
  test.push_back("hello world");
  std::copy(test.begin(), test.end(), 
            std::ostream_iterator<std::string>(std::cout, "\n"));
}

Versus:

#include <vector>
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>

int main() {
  std::vector<std::string*> test;
  // if push_back throws then this will leak:
  test.push_back(new std::string("hello world"));
  // Can't do:
  std::copy(test.begin(), test.end(), 
            std::ostream_iterator<std::string>(std::cout, "\n"));
  // Will now leak too
}

(which I would never do)

Or possibly:

#include <vector>
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>

int main() {
  std::vector<std::string*> test;
  std::string str("hello world");
  test.push_back(&str);
  // Can't do:
  std::copy(test.begin(), test.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

But the semantics of this one make me feel uncomfortable - it's not clear at all that delete elsewhere in the code would be a very bad thing and you still can't use STL algorithms very comfortably even if there is no leak issue.

回答2:

The "overhead" of a pointer dereference is essentially zero. That is, you would have great difficulty of measuring that versus referencing an object in its place. Beware of early optimization and over-optimization, the root of all programming evil.

You should do whichever (pointer or object) makes the most application sense.

回答3:

First, I agree with those who say to write your code however it makes the most sense, and do not worry about micro-optimizations like this until your profiling tool tells you to.

That said, you should be worried more about accessing your data, not allocating and freeing it. If your access patterns to vector elements have good locality -- e.g., looping through them all, or accessing nearby elements together -- then a vector of pointers is likely to destroy that locality and cause a major hit to performance.

The #1 concern for speed is using good algorithms, of course. But the #2 concern is having good locality, because memory is slow... And relative to the CPU, it gets slower every year.

So, for small, simple objects, vector<Obj> is almost certainly going to be faster than vector<Obj *>, and possibly much faster.

As for "can a vector really allocate objects on the stack", the answer is yes in terms of semantics, but no in terms of implementation (most likely). A typical vector implementation consists of three pointers internally: Base, Current, and End. All three point into a contiguous block on the heap, and the vector's destructor will deallocate that block. (Again, this is a typical implementation; in theory, your compiler and/or runtime might do something else. But I bet it doesn't.)

Such an implementation supports dynamic expansion by re-allocating that block and copying data. This is not as slow as it sounds for two reasons: (1) Linear memory access (e.g. copying) is pretty fast; and (2) each reallocation increases the size of the block by a factor, which means push_back is still O(1) amortized.

回答4:

The one thing not mentioned against pointers vs structs is continuity of memory (matters more on embedded). Basically, a vector of struct will be allocated in 1 block of memory while a vector of pointers to struct will (probably) be allocated all over the place. Fragmentation of memory and Data Cache will thus seriously suffer.

回答5:

Your question is not very clear. First you talk of a vector of pointers, and then you write something like: std::vector<myStruct> *ptr2Vect.

std::vector<myStruct> *ptr2Vect is a pointer to a vector of myStruct objects. The vector is not storing pointers, so you don't need to worry about memory management of the obects held - just need to ensure that myStruct is copy constructable. You do need to manually manage the clean up of the pointer to the vector though (ptr2Vect)
Most modern systems work very efficiently with pointers, if you are asking this kind of question, you're following the route for premature optimization, stop, take a step back.
vector relies on dynamic allocation, but how it expands, it manages (you can control it to an extent, for example, if you know the size before hand, you can reserve)

From what I gather from the question, you're really not at the point where you need to worry about the overhead of automatic/dynamic allocation and pointer de-referencing. These are the least of your concerns, just learn to write good code - all that other stuff will come later (if at all necessary)

回答6:

However my preference is to be determined by the advantage of having a std::vector *ptr2Vect member. Namey, not having to call delete on each element. How big a performance advantage is this?

it depends on the number of elements, but it can save you a ton of memory and time. see:

What is the cost of inheritance?

Can vector really allocate objects on the stack?

yes. a vector could reserve an internal allocation for this purpose, or the compiler could optimize this in some cases. that's not a feature/optimization you should rely on. you could create your own allocator or pod-array container tailored for your needs.

I am fairly new to template classes and wonder if it could be possible for a dynamic array to expand on the stack and at what price?

if you have a constant size, then a specific implementation (such as boost::array) can save a ton of runtime overhead. i've written several types for different contexts.

回答7:

My first suggestion to you would , 'you dont have to have pointer-to-vector' as member. You want a simple vector myVector; OR vector< myVector;

Secondly, you will make your decision based on following questions

What is the size of vector ? (how many elements max) say n
What is the size of struct ? (sizeof(T)) say s
What is the cost of copying the struct ? say c
Is your struct holding some resource ? (e.g. some file handle or semaphore et cetra) ? If it is holding some resource, then vector can complicate your life much more.

Now n,s,c are going to determine your runtime overhead of vector For vector, cost due to n,s,c are zero. For vector, cost due to n,s,c are n*s sizeUnits + n*c executionUnits.

My own rule of thumb : No rule of thumb exists. First code it with vector, if it is not good enough, then go with vector

If your program, is a small program which is going to exit the process after you have used your vector, then I wouldnt even bother to free them. IF NOT, then just run a
for(auto it=v.begin();it!=v.end();++it) delete *it;