Given a set of n points on plane, I want to preprocess these points somehow faster than O(n^2) (O(nlog(n)) preferably), and then be able to answer on queries of the following kind "How many of n points lie inside a circle with given center and radius?" faster than O(n) (O(log(n) preferably).
Can you suggest some data structure or algorithm I can use for this problem?
I know that such types of problems are often solved using Voronoi diagrams, but I don't know how to apply it here.
Build a KD-tree of the points, this should give you much better complexity than O(n), on average O(log(n)) I think.
You can use a 2D tree since the points are constrained to a plane.
Assuming that we have transformed the problem into 2D, we'll have something like this for the points:
greater
andless
contains points with greater and lesser coordinates respectively along the splitaxis.Then you call this function with the root of the KD-tree
Assuming you have a set of points S in a cartesian plane with coordinates (xi,yi), given an arbitrary circle with center (xc,yc) and radius r you want to find all the points contained within that circle.
I will also assume that the points and the circle may move so certain static structures that can speed this up won't necessarily be appropriate.
Three things spring to mind that can speed this up:
Firstly, you can check:
instead of
Secondly, you can cull the list of points somewhat by remembering that a point can only be within the circle if:
This is known as a bounding box. You can use it as either an approximation or to cut down your list of points to a smaller subset to check accurately with the first equation.
Lastly, sort your points in either x or y order and then you can do a bisection search to find the set of points that are possibly within your bounding box, further cutting down on unnecessary checks.
If my goal is speed, and the number of points weren't huge (millions,) I'd focus on memory footprint as much as algorithmic complexity.
An unbalanced k-d tree is best on paper, but it requires pointers, which can expand memory footprint by 3x+, so it is out.
A balanced k-d tree requires no storage, other than for an array with one scalar for each point. But it too has a flaw: the scalars can not be quantized - they must be the same 32 bit floats as in the original points. If they are quantized, it is no longer possible to guarantee that a point which appears earlier in the array is either on the splitting plane, or to its left AND that a point which appears later in the array is either on the splitting plane, or to its right.
There is a data structure I developed that addresses this problem. The Synergetics folks tell us that volume is experientially four-directional. Let's say that a plane is likewise experientially three-directional.
We're accustomed to traversing a plane by the four directions -x, +x, -y, and +y, but it's simpler to use the three directions a, b, and c, which point at the vertices of an equilateral triangle.
When building the balanced k-d tree, project each point onto the a, b, and c axes. Sort the points by increasing a. For the median point, round down, quantize and store a. Then, for the sub-arrays to the left and right of the median, sort by increasing b, and for the median points, round down, quantize, and store b. Recurse and repeat until each point has stored a value.
Then, when testing a circle (or whatever) against the structure, first calculate the maximum a, b, and c coordinates of the circle. This describes a triangle. In the data structure we made in the last paragraph, compare the median point's a coordinate to the circle's maximum a coordinate. If the point's a is larger than the circle's a, we can disqualify all points after the median. Then, for the sub-arrays to the left and right (if not disqualified) of the median, compare the circle's b to the median point's b coordinate. Recurse and repeat until there are no more points to visit.
This is similar in theme to the BIH data structure, but requires no intervals of -x and +x and -y and +y, because a, b, and c are just as good at traversing the plane, and require one fewer direction to do it.
depending on how much precomputing time you have, you could build a tree like this:
first node branches are x-values, below them are y-value nodes, and below them are radius value nodes. at each leaf is a hashset of points.
when you want to compute the points at x,y,r: go through your tree and go down the branch that matches your x,y values the closest. when you get down to the root level, you need to do a little match (constant time stuff), but you can find a radius such that all the points in that circle (defined by the path in the tree) are inside the circle specified by x,y,r, and another circle (same x_tree,y_tree in the tree as before, but different r_tree) such that all of the points in the original circle (specified by x,y,r) are in that circle.
from there, go through all the points in the larger of the two tree circles: if a point is in the smaller circle add it to the results, if not, run the distance check on it.
only problem is that it takes a very long time to precompute the tree. although, you can specify the amount of time you want to spend by changing how many x,y, and r branches you want to have in your tree.
Build a spatial subdivision structure such as a quadtree or KD-tree of the points. At each node store the amount of points covered by that node. Then when you need to count the points covered by the lookup circle, traverse the tree and for each subdivision in a node check if it is fully outside the circle, then ignore it, if it is fully inside the circle then add its count to the total if it intersects with the circle, recurse, when you get to the leaf, check the point(s) inside the leaf for containment.
This is still O(n) worst case (for instance if all the points lie on the circle perimeter) but average case is O(log(n)).
I used Andreas's code but it contains a bug. For example I had two points on the plane [13, 2], [13, -1] and my origin point was [0, 0] with a radius of 100. It finds only 1 point. This is my fix:
The differnce is Andreas checked for now children with just if (!root->greater) which is not complete. I on the other hand don't do that check, I simply check if the root is valid. Let me know if you find a bug.