I wrote custom operator new and operator delete for the class MyOrder. I am allocating memory using boost::singleton pool. Here is the program testing the performance,
#include <boost/pool/singleton_pool.hpp>
#include <boost/progress.hpp>
#include <iostream>
#include <new>
#include <vector>
class MyOrder{
std::vector<int> v1_;
std::vector<double> v2_;
std::string s1_;
std::string s2_;
public:
MyOrder(std::string s1, std::string s2): s1_(s1), s2_(s2) {}
~MyOrder(){}
static void * operator new(size_t size);
static void operator delete(void * rawMemory) throw();
};
struct MyOrderTag{};
typedef boost::singleton_pool<MyOrderTag, sizeof(MyOrder)> MyOrderPool;
void* MyOrder:: operator new(size_t size)
{
if (size != sizeof(MyOrder))
return ::operator new(size);
while(true){
void * ptr = MyOrderPool::malloc();
if (ptr != NULL) return ptr;
std::new_handler globalNewHandler = std::set_new_handler(0);
std::set_new_handler(globalNewHandler);
if(globalNewHandler) globalNewHandler();
else throw std::bad_alloc();
}
}
void MyOrder::operator delete(void * rawMemory) throw()
{
if(rawMemory == 0) return;
MyOrderPool::free(rawMemory);
}
int main()
{
MyOrder* mo = NULL;
std::vector<MyOrder*> v;
v.reserve(100000);
boost::progress_timer howlong;
for(int i = 0; i< 100000; ++i)
{
mo = new MyOrder("Sanket", "Sharma");
v.push_back(mo);
}
for (std::vector<MyOrder*>::const_iterator it = v.begin(); it != v.end(); ++it)
{
delete *it;
}
return 0;
}
I compiled the above program using -O2 flag and ran on my Macbook with 2.26 GHz Intel Core 2 Duo and it took 0.16 seconds. Then I commented off the lines where I have declared and defined the custom operator new and operator delete, recompiled with -O2 flags and ran on the same machine it took 0.13 seconds.
Allocating and deallocating memory using singleton_pool for objects of same size should speed it up. Why is it making it slow? Or is the overhead of creating a pool nullifying the performance benefit gained in this small program?
Update:
I replaced the two std::string variables with an int and a double and this time ran the two programs with 100000000 (ie 1000 times before) iterations each on a 3.0 GHZ AMD Phenom(tm) II X4 945 Processor. The one using custom memory allocation takes 3.2 seconds while the one using default memory allocation takes 8.26 seconds. So this time custom memory allocation wins.
I think your numbers are meaningless. If you only checked the runtime once, and you found
0.13
vs0.16
seconds than that is entirely meaningless, and dominated by overhead.You must run the snippet you want to test thousands of times and then compare the data to rule out overhead.
No really, that
0.03
seconds difference can easily be explained by your process getting switched out, etc.