Something faster than std::priority_queue
as a min heap?
The original question is here. You can resolve the names generated by grpof with the demangler. With the help of the users there, I came to a conclusion, where this block of code (that I expect to get executed much more times than I will perform a pop):
/**
* Min_heap is actually a std::priority_queue,
* with std::greater as a parameter.
*/
typedef std::priority_queue<std::tuple<float, int, int>,
std::vector<std::tuple<float, int, int> >,
std::greater<std::tuple<float, int, int> > > Min_heap;
...
void nn(..) { // the core function of my project
if(...) {
branch.push(std::make_tuple(new_dist, other_child_i, tree_i));
}
}
seems to be the bottleneck of my project (I think), as I can see after calling this:
gprof -q_ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i geraf gmon.out > analysis.txt
and got this:
granularity: each sample hit covers 2 byte(s) for 0.01% of 125.47 seconds
index % time self children called name
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
105.54 0.09 320000/320000 Auto_random_kd_forest<float>::Auto_random_kd_forest(unsigned int&, unsigned int&, std::string const&, unsigned int, std::string const&, int, float, std::vector<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > >, std::allocator<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > > > >&, Params*, int) (1)
[2] 84.2 105.54 0.09 320000+1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
0.08 0.00 1967195/2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
0.01 0.00 12000/12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
-----------------------------------------------
0.00 0.00 64000/2031195 _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIPSt5tupleIJfiiEESt6vectorIS3_SaIS3_EEEEiS3_St7greaterIS3_EEvT_T0_SC_T1_T2_ (10)
0.08 0.00 1967195/2031195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[9] 0.1 0.09 0.00 2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
-----------------------------------------------
0.01 0.00 12000/12000 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[11] 0.0 0.01 0.00 12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
I am pushing way more items than I pop. In the question I linked I show the code that is relative with my heap. We can assume that the heap never gets empty. I am not sure about the distribution.
See the heap comparison on the wiki:
http://en.wikipedia.org/wiki/Heap_(data_structure)
The fastest you can get is Fibonacci heap. AFAIK the STL priority queue is just a binary heap. You could improve on speed by using the boost implementation of the fibonacci heap. Also here is a question from SO showing the usage of the boost heap.
The thing worth mentioning is that the data shown in the wikipedia is theoretical comparison of heaps. The STL implementation is the binary heap because it is usually much faster then fibonacci heap for heaps of small size. The nice question summarising the fibonacci heap information is here.