Consider the following code:
class A
{
B* b; // an A object owns a B object
A() : b(NULL) { } // we don't know what b will be when constructing A
void calledVeryOften(…)
{
if (b)
delete b;
b = new B(param1, param2, param3, param4);
}
};
My goal: I need to maximize performance, which, in this case, means minimizing the amount of memory allocations.
The obvious thing to do here is to change B* b;
to B b;
. I see two problems with this approach:
- I need to initialize
b
in the constructor. Since I don't know whatb
will be, this means I need to pass dummy values to B's constructor. Which, IMO, is ugly. - In
calledVeryOften()
, I'll have to do something like this:b = B(…)
, which is wrong for two reasons:- The destructor of
b
won't be called. - A temporary instance of B will be constructed, then copied into
b
, then the destructor of the temporary instance will be called. The copy and the destructor call could be avoided. Worse, calling the destructor could very well result in undesired behavior.
- The destructor of
So what solutions do I have to avoid using new
? Please keep in mind that:
- I only have control over A. I don't have control over B, and I don't have control over the users of A.
- I want to keep the code as clean and readable as possible.
I liked Klaim's answer, so I wrote this up real fast. I don't claim perfect correctness but it looks pretty good to me. (i.e., the only testing it has is the sample
main
below)It's a generic lazy-initializer. The space for the object is allocated once, and the object starts at null. You can then
create
, over-writing previous objects, with no new memory allocations.It implements all the necessary constructors, destructor, copy/assignment, swap, yadda-yadda. Here you go:
In your case, just create a member in your class:
lazy_object<B>
and you're done. No manual releases or making copy-constructors, destructors, etc. Everything is taken care of in your nice, small re-usable class. :)EDIT
Removed the need for vector, should save a bit of space and what-not.
EDIT2
This uses
aligned_storage
andalignment_of
to use the stack instead of heap. I used boost, but this functionality exists in both TR1 and C++0x. We lose the ability to copy, and therefore swap.And there we go.
Like the others have already suggested: Try placement new..
Here is a complete example:
Erm, is there some reason you can't do this?
(or set them individually, since you don't have access to the
B
class - those values do have mutator-methods, right?)Are you sure that memory allocation is the bottleneck you think it is? Is B's constructor trivially fast?
If memory allocation is the real problem, then placement new or some of the other solutions here might well help.
If the types and ranges of the param[1..4] are reasonable, and the B constructor "heavy", you might also consider using a cached set of B. This presumes you are actually allowed to have more than one at a time, that it does not front a resource for example.
Simply reserve the memory required for b (via a pool or by hand) and reuse it each time you delete/new instead of reallocating each time.
Example :
In some cases a Pool or ObjectPool could be a better implementation of the same idea.
The construction/destruction cost will then only be dependante on the constructor and destructor of the B class.
A quick test of Martin York's assertion that this is a premature optimisation, and that new/delete are optimised well beyond the ability of mere programmers to improve. Obviously the questioner will have to time his own code to see whether avoiding new/delete helps him, but it seems to me that for certain classes and uses it will make a big difference:
This is roughly what I expected: the GMan-style (destruct/placement new) code takes twice as long, and is presumably doing twice as much allocation. If the vector member of A is replaced with an int, then the GMan-style code takes a fraction of a second. That's GCC 3.
This I'm not so sure about, though: now the delete/new takes three times as long as the destruct/placement new version.
[Edit: I think I've figured it out - GCC 4 is faster on the 0-sized vectors, in effect subtracting a constant time from both versions of the code. Changing
(a*b)%2
to(a*b)%2+1
restores the 2:1 time ratio, with 3.7s vs 7.5]Note that I've not taken any special steps to correctly align the stack array, but printing the address shows it's 16-aligned.
Also, -g doesn't affect the timings. I left it in accidentally after I was looking at the objdump to check that -O3 hadn't completely removed the loop. That pointers called yzz because searching for "y" didn't go quite as well as I'd hoped. But I've just re-run without it.