Recently I was trying to do some performance benchmarks, comparing std::stack<int, std::vector<int>>
and my own simple implementation of stack (that use pre-allocated memory). Now I’m experiencing some strange behavior.
First thing I want to ask is this line in stack benchmark code:
// std::vector<int> magicVector(10);
When I uncomment this line, performance rise about 17% (benchmark time drops from 6.5 to 5.4 seconds). But the line should have no impact on the rest of the program because it's not modifying any others members. Besides, it doesn't matter if it is vector of int or vector of double...
Second thing I want to ask is big performance difference between my stack implementation and std::stack
. I was told that std::stack
should be as fast as my stack but results shows that my "FastStack" is twice as fast.
Results (with uncommented performance increasing line):
stack 5.38979
stack 5.34406
stack 5.32404
stack 5.30519
FastStack 2.59635
FastStack 2.59204
FastStack 2.59713
FastStack 2.64814
These results come from release build from VS2010 with /O2, /Ot, /Ob2 and others default optimizations. My CPU is Intel i5 3570k with default clock (3.6 GHz for one thread).
I put all code in one file so anyone can easily test it.
#define _SECURE_SCL 0
#include <iostream>
#include <vector>
#include <stack>
#include <Windows.h>
using namespace std;
//---------------------------------------------------------------------------------
//---------------------------------------------------------------------------------
// Purpose: High Resolution Timer
//---------------------------------------------------------------------------------
class HRTimer
{
public:
HRTimer();
double GetFrequency(void);
void Start(void) ;
double Stop(void);
double GetTime();
private:
LARGE_INTEGER start;
LARGE_INTEGER stop;
double frequency;
};
HRTimer::HRTimer()
{
frequency = this->GetFrequency();
}
double HRTimer::GetFrequency(void)
{
LARGE_INTEGER proc_freq;
if (!::QueryPerformanceFrequency(&proc_freq))
return -1;
return proc_freq.QuadPart;
}
void HRTimer::Start(void)
{
DWORD_PTR oldmask = ::SetThreadAffinityMask(::GetCurrentThread(), 0);
::QueryPerformanceCounter(&start);
::SetThreadAffinityMask(::GetCurrentThread(), oldmask);
}
double HRTimer::Stop(void)
{
DWORD_PTR oldmask = ::SetThreadAffinityMask(::GetCurrentThread(), 0);
::QueryPerformanceCounter(&stop);
::SetThreadAffinityMask(::GetCurrentThread(), oldmask);
return ((stop.QuadPart - start.QuadPart) / frequency);
}
double HRTimer::GetTime()
{
LARGE_INTEGER time;
::QueryPerformanceCounter(&time);
return time.QuadPart / frequency;
}
//---------------------------------------------------------------------------------
//---------------------------------------------------------------------------------
// Purpose: Should be faster than std::stack
//---------------------------------------------------------------------------------
template <class T>
class FastStack
{
public:
T* st;
int allocationSize;
int lastIndex;
public:
FastStack(int stackSize);
~FastStack();
inline void resize(int newSize);
inline void push(T x);
inline void pop();
inline T getAndRemove();
inline T getLast();
inline void clear();
};
template <class T>
FastStack<T>::FastStack( int stackSize )
{
st = NULL;
this->allocationSize = stackSize;
st = new T[stackSize];
lastIndex = -1;
}
template <class T>
FastStack<T>::~FastStack()
{
delete [] st;
}
template <class T>
void FastStack<T>::clear()
{
lastIndex = -1;
}
template <class T>
T FastStack<T>::getLast()
{
return st[lastIndex];
}
template <class T>
T FastStack<T>::getAndRemove()
{
return st[lastIndex--];
}
template <class T>
void FastStack<T>::pop()
{
--lastIndex;
}
template <class T>
void FastStack<T>::push( T x )
{
st[++lastIndex] = x;
}
template <class T>
void FastStack<T>::resize( int newSize )
{
if (st != NULL)
delete [] st;
st = new T[newSize];
}
//---------------------------------------------------------------------------------
//---------------------------------------------------------------------------------
//---------------------------------------------------------------------------------
// Purpose: Benchmark of std::stack and FastStack
//---------------------------------------------------------------------------------
int main(int argc, char *argv[])
{
#if 1
for (int it = 0; it < 4; it++)
{
std::stack<int, std::vector<int>> bStack;
int x;
for (int i = 0; i < 100; i++) // after this two loops, bStack's capacity will be 141 so there will be no more reallocating
bStack.push(i);
for (int i = 0; i < 100; i++)
bStack.pop();
// std::vector<int> magicVector(10); // when you uncomment this line, performance will magically rise about 18%
HRTimer timer;
timer.Start();
for (int i = 0; i < 2000000000; i++)
{
bStack.push(i);
x = bStack.top();
if (i % 100 == 0 && i != 0)
for (int j = 0; j < 100; j++)
bStack.pop();
}
double totalTime = timer.Stop();
cout << "stack " << totalTime << endl;
}
#endif
//------------------------------------------------------------------------------------
#if 1
for (int it = 0; it < 4; it++)
{
FastStack<int> fstack(200);
int x;
HRTimer timer;
timer.Start();
for (int i = 0; i < 2000000000; i++)
{
fstack.push(i);
x = fstack.getLast();
if (i % 100 == 0 && i != 0)
for (int j = 0; j < 100; j++)
fstack.pop();
}
double totalTime = timer.Stop();
cout << "FastStack " << totalTime << endl;
}
#endif
cout << "Done";
cin.get();
return 0;
}
.
EDIT: Since everybody talks about my really bad implementation of my stack I want to set things right. I created that stack in few minutes and I implemented just few features that I currently needed. It was never meant to be a replacement of std::stack :) or save to use in all cases. The only goal was to achieve maximum speed and correct results. I'm sorry about this misunderstanding… I just want to know few answers…
The many comments (and even answers) focus on the risks in your implementation. Yet the question stands.
As directly demonstrated below rectifying the perceived code shortcomings would not change anything significant about the performance.
Here is the OP's code modified to be (A) safe, and (B) supporting the same operations as
std::stack
, and (C) reserving buffer space also for thestd::stack
, in order to clarify things for those who mistakenly believe that this stuff matters for the performance:Results on this slow-as-molasses Samsung RC530 laptop:
And similarly for Visual C++.
Now let's look at a typical implementation of
std::vector::push_back
, which is called bystd::stack<T, std::vector<T>>::push
(in passing, I know of only 3 programmers who have ever used this indentation style, namely PJP, Petzold and myself; I now, since 1998 or thereabouts, think it's horrible!):I suspect that the measured inefficiency lies at least partly in all the stuff going on there, and perhaps it's also a matter of automatically generated safety checks.
For a debug build the
std::stack
performance is so extremely ungood that I gave up waiting for any result.EDIT: following Xeo’s comment below I updated
push
to check for "self-push" in the case of buffer reallocation, by factoring that out as a separate function:Mysteriously, although
reserveAndPush
is never called in this testing, it affects the performance – due to code size not fitting cache?EDIT 2: DeadMG showed that the code must be buggy. I believe the problem was a missing
return
, plus the expression computing new size (twice zero is still zero). He also pointed out that I forgot to showreserveAndPush
. Should be:Your method implementations are all broken. Ignoring the copy constructor and other missing operations, your
push
invokes UB if you push too much, and yourresize
is plainly broken as it does not copy over the previous data and it's not exception safe and your push isn't exception safe and you invoke too many copies and yourgetAndRemove
isn't exception safe and you don't destruct popped off elements and you don't construct new elements properly, only assign them and you needlessly default-construct when created, and there are probably more I haven't found.Basically, your class is extremely and hideously unsafe in every imaginable respect, destroys the user's data at the drop of a hat, calls all the wrong functions on
T
, and will go crying in a corner the instant an exception is thrown anywhere.It's a giant pile of bad and the fact that it's "faster" than
std::stack
is, well, entirely irrelevant, since all you've proven is that if you don't have to meet the requirements, you can go as fast as you like, which we all already knew.Fundamentally, as sbi said, you clearly don't understand the semantics of
std::stack
, nor important C++ aspects like exception safety, and the ways in which your code fails to work correctly is what makes it execute faster. You've got a long way to go, my friend.Contrary to a
std::stack
usingstd::vector
, your stack does not reallocate when it runs out of space, but simply blows up the planet. Allocation, however, is a huge drain on performance, so skipping on that will certainly gain you performance.However, in your place I'd grab one of the well-aged
static_vector
implementations floating on the web and stuff that intostd::stack
in place ofstd::vector
. That way, you skip all the performance-hungry dynamic memory handling, but you have a valid stack implementation with a container for memory handling underneath that's very likely to be much better than what you come up with.