I am using Point Cloud Library (PCL) based C++ implementation of kd-tree nearest neighbour(NN) searching. The data set contains about 2.2 million points. I am searching NN points for every other point. The search radius is set at 2.0. To fully compute that, it takes about 12 hours! I am using windows 64 bit machine with 4GB RAM. Is it very common for kd-tree searching? I wonder if there is any other c++ library for 3D point cloud data, which is faster. I heard about ANN C++ library & CGAL, but not sure how fast these are.
问题:
回答1:
In short:
You can only be sure if you run yourself the time measurements.
You should ensure if the NN library is faster over brute force, which probably is the case for your data.
As anderas mentioned in the comment, the search radius plays also a significant role. If a lot of points fall into the search radius, the search can become really slow.
Full answer:
3 dimensions are not much. The problem occurs due to the number of points you have.
ANN stands for approximate nearest neighbour searching. It is very common to accept the trade-off between accuracy and speed when it comes to NNS (Nearest Neighbour search).
This means that you perform the search faster, but you may not find the exact NN, but a close one (for example not the closest point, but the second closest one and so on). More speed means less accuracy and vice versa.
CGAL has also a parameter epsilon, which stands for accuracy (ε = 0 means full accuracy). CGAL is meant to go till 3 dimensions, so you could give it a shot.
I could just test myself and post the answers. However, this would not be 100% safe, since the points you have may have some relation. It is very important for the performance of the library, if it is going to exploit the relation the points (may) have to each other.
Another factor to take into account, easiness of installation.
CGAL is hard to install. When I did it, I followed these steps. ANN is easy to install. I would also suggest you to take a look at BOOST Geometry, which may come in handy.
FLANN is also a strong player in the field, but I would not suggest it, since it's meant to handle datasets of much bigger dimensions (like 128 for example).
....
OK I admit it, I am now curious myself and I am going to run some tests!
....
ANN
(I am not posting the code, so that the answer is not getting too big. There are examples in the documentation and you can ask of course if you want!). Output:
// for a Klein bottle
samaras@samaras-A15:~/Inria/nn_libs/ANN/ann_1.1.2/code$ g++ main.cpp -O3 -I ../include/ -L../lib -lANN -std=c++0x -o myExe
samaras@samaras-A15:~/Inria/nn_libs/ANN/ann_1.1.2/code$ ./myExe
N = 1000000
D = 3
ε = 0 // full accuracy
Misses: 0
creation of the tree took 1.06638 seconds.
Brute force for min: 0.009985598 seconds.
NN for 0.0 0.000078551 seconds.
Speedup of ANN for NN = 8.721533780
For ε = 0.1
I got:
Misses: 1000
creation of the tree took 0.727613 seconds.
Brute force for min: 0.006351318 seconds.
NN for 0.0 0.000004260 seconds.
Speedup of ANN for NN = 8.678169573
// for a shpere
ε = 0
Misses: 0
creation of the tree took 1.28098 seconds.
Brute force for min: 0.006382912 seconds.
NN for 0.0 0.000022341 seconds.
Speedup of ANN for NN = 4.897436311
ε = 0.1
Misses: 1000
creation of the tree took 1.25572 seconds.
Brute force for min: 0.006482465 seconds.
NN for 0.0 0.000004379 seconds.
Speedup of ANN for NN = 5.144413371
Notice the difference in the speedup! This has to do with the nature of the datasets, as explained above (the relation the points have).
CGAL:
// Klein bottle
samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.02478 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 471492
Tree depth: 34
0x80591e4
Brute force for min: 0.007979495 seconds.
NN for 0.0 0.008085704 seconds.
Speedup of cgal for NN = 0.983849423
ε = 0.1
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.02628 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 471492
Tree depth: 34
0x80591e4
Brute force for min: 0.007852951 seconds.
NN for 0.0 0.007856228 seconds.
Speedup of cgal for NN = 0.996250305
// Sphere
samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.025502 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 465002
Tree depth: 32
0x80591e4
Brute force for min: 0.007946504 seconds.
NN for 0.0 0.007981456 seconds.
Speedup of cgal for NN = 0.992449817
samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0.1
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.025106 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 465002
Tree depth: 32
0x80591e4
Brute force for min: 0.008115519 seconds.
NN for 0.0 0.008117261 seconds.
Speedup of cgal for NN = 0.996702679
ANN is clearer faster than CGAL for my tests, probably it is for yours too!
A side-note:
You actually asking for the NN of every point. However, the library doesn't know that and doesn't take into account the previous work done for every point, which is a pity. However, I am not aware of any library that does this.