I try to replace std::multiset with std::priority_queue. But I was dissapointed with the speed results. Running time of the algorithm increase by 50%...
Here are the corresponding commands:
top() = begin();
pop() = erase(knn.begin());
push() = insert();
I am surprised with the speed of priority_queue implementation, I expected different results (better for PQ)... Conceptually, the multiset is being used as a priority queue. Why are the priority queue and the multiset have such different performance, even with -O2
?
Average of ten results, MSVS 2010, Win XP, 32 bit, method findAllKNN2 () (see bellow, please)
MS
N time [s]
100 000 0.5
1 000 000 8
PQ
N time [s]
100 000 0.8
1 000 000 12
What could cause these results? No other changes of the source code have been made... Thanks for your help...
MS Implementation:
template <typename Point>
struct TKDNodePriority
{
KDNode <Point> *node;
typename Point::Type priority;
TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}
bool operator < ( const TKDNodePriority <Point> &n1 ) const
{
return priority > n1.priority;
}
};
template <typename Point>
struct TNNeighboursList
{
typedef std::multiset < TKDNodePriority <Point> > Type;
};
Method:
template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const
{
if ( node == NULL )
{
return;
}
if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
{
findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
}
else
{
findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
}
typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );
if (knn.size() == k)
{
if (dist_q_node < knn.begin()->priority )
{
knn.erase(knn.begin());
knn.insert ( TKDNodePriority <Point> ( node, dist_q_node ) );
}
}
else
{
knn.insert ( TKDNodePriority <Point> ( node, dist_q_node ) );
}
typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;
typename Point::Type top_priority = knn.begin()->priority;
if ( knn.size() < k || dist_q_node_straight < top_priority )
{
if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
{
findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
}
else
{
findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
}
}
}
PQ implementation (slower, why?)
template <typename Point>
struct TKDNodePriority
{
KDNode <Point> *node;
typename Point::Type priority;
TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}
bool operator < ( const TKDNodePriority <Point> &n1 ) const
{
return priority > n1.priority;
}
};
template <typename Point>
struct TNNeighboursList
{
typedef std::priority_queue< TKDNodePriority <Point> > Type;
};
Method:
template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const
{
if ( node == NULL )
{
return;
}
if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
{
findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
}
else
{
findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
}
typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );
if (knn.size() == k)
{
if (dist_q_node < knn.top().priority )
{
knn.pop();
knn.push ( TKDNodePriority <Point> ( node, dist_q_node ) );
}
}
else
{
knn.push ( TKDNodePriority <Point> ( node, dist_q_node ) );
}
typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;
typename Point::Type top_priority = knn.top().priority;
if ( knn.size() < k || dist_q_node_straight < top_priority )
{
if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
{
findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
}
else
{
findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
}
}
}
It appears that your compiler's optimization settings may have a large impact on performance. In the code below, multiset handily beats vector-based and deque-based priority_queue without optimization. However, with "-O3" optimization, the vector-based priority queue beats all. Now, these experiments were run on Linux with GCC, so perhaps you would get different results on Windows. I believe enabling optimization may remove many error-checking behaviors in STL's vector.
Without Optimization:
With -O2 Optimization:
With -O3 Optimization:
Test Harness (don't forget to link with -lrt):
The implementation of the
priority_queue
is the culprit, according to what I understand.priority_queue
's are implemented (underneath) as a specializedvector
ordeque
. Because priority_queue needs to have random-access iterators. When you pop or push items into thepriority_queue
, remaining items in the queue needs to be copied to the empty space and the same happens with insertion.multi_set
is based on keys.EDIT: Terribly sorry, multi_set is not based on hash key. I confused it with multi_map for some reason. But multi_set is a multiple sorted associative container and stores elements based on the key, which is same as the value. Due to the way elements are stored in a
multi_set
, it-- Quoted from SGI documentation.
This means that storage of a
multi_set
is not linear and hence the performance gain.This non-synthetic benchmark is taken from an actual usage of priority_queue. Feed this file in as stdin to run the benchmark.
Using clang++ with -O2 gets me:
In summary, priority_queue with vector consistently wins. In second place is priority_queue with deque, and in last is a multiset.
First of all, author didn't provide minimal example of code that leads to mentioned performance drop. Second, the question was asked 8 years ago, I'm sure compilers made a huge boost on performance.
I've made a benchmark example where I take 1st element in the queue then push in back with another priority (simulating push of new element without creating one), doing that by count of elements in array
kNodesCount
in a loop withkRunsCount
iterations. I'm comparingpriority_queue
withmultiset
andmultimap
. I've decided includemultimap
for more precise comparsion. It's simple test is very close to author use case, also I've tried to reproduce structs he used in the code samples.I've made release build with /Ox using VS v15.8.9. Take a look on results for 10'000'000 items in 10 runs:
Hmhmh, as you see
multiset
is just the same performance asmultimap
andpriority_queue
is the most fastest (around 43% faster). So why is that happen?Let's start from
priority_queue
, C++ standard doesn't tell us how to implement one or another container or structure, but in most cases it's based on a binary heap (look for msvc and gcc implementation)! In case ofpriority_queue
you have no access to any element except top, you can't iterate through them, get by index, or even take last element (it makes some space for optimization). Average insert for binary heap is O(1) and only the worst case is O(log n) and deletion is O(log n) since we taking element from the bottom then searching next high priority.What about
multimap
andmultiset
. They both usually implemented on red-black binary tree (look for msvc and gcc implementation), where average insert is O(log n) and deletion O(log n) either.From this point of view
priority_queue
NEVER can be slower ofmultiset
ormultimap
. So, back to your question,multiset
as priority queue is NOT faster thanpriority_queue
itself. There might be plenty of reasons, including rawpriority_queue
implementation on old compiler or wrong usage of this structure (the question doesn't contain minimal workable example), besides author did't mentioned compile flags or compiler version, sometimes optimization makes significant changes.UPDATE 1 upon @noɥʇʎԀʎzɐɹƆ request
Unfortunately I don't have access to linux environment right now, but I have mingw-w64 installed, version info: g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0. Used processor just the same as for visual studio: Processor Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 2001 Mhz, 4 Core(s), 8 Logical Processor(s).
So results for
g++ -O2
is:You may notice it's almost the same picture as for msvc.
UPDATE 2 thanks to @JorgeBellon
A quick-bench.com online benchmark link, check it yourself!
Would like to see any additions to my post, cheers!