Let's imagine we have a plane with some points on it. We also have a circle of given radius.
I need an algorithm that determines such position of the circle that it covers maximal possible number of points. Of course, there are many such positions, so the algorithm should return one of them.
Precision is not important and the algorithm may do small mistakes.
Here is an example picture:
Input:
int n
(n<=50) – number of points;
float x[n]
and float y[n]
– arrays with points' X and Y coordinates;
float r
– radius of the circle.
Output:
float cx
and float cy
– coordinates of the circle's center
Lightning speed of the algorithm is not required, but it shouldn't be too slow (because I know a few slow solutions for this situation).
C++ code is preferred, but not obligatory.
Might I suggest a density map? Find the min and max bounds for the x and y. Divide the range of the x and y bounds into bins having widths equal to the diameter of your circle. Count the number of points in each bin for the x and y separately. Now find the intersection on your density map between the highest ranked x bin with the highest ranked y bin.
This is a VERY fast algorithm to quickly generalize large data sets but it is not always accurate, to improve accuracy you can slice the bins into smaller and smaller pieces or shift the bin positions to the left or right n times and use a voting system to select the answer that occurs most often between trials.
The problem reverts back to finding the global optimum of a function f
:R x R -> N
. The input for f is the center point of the circle, and the value, of course, is the number of included points from the set. The graph of the function will be discontinuous, staircase-like.You could start by testing the function in each point corresponding to a point in the set, order the points by decreasing values of f, then intensify search around those points (for example moving out along a spiral).
Another option is to consider all intersection points of segments connecting any pairs of points in the set. Your optimal point will be at one of these intersections I think, but their number is probably too big to consider.
You may also hybridise options 1 and 2, and consider intersections of the segments generated from points around 'good set points'.
Anyhow, unless the number of set points is low (allowing you to check all intersections), I don't think you can guarantee to find the optimum (just a good solution).
At a first glance, I would say a quad tree solution.
Also, there is an information visualization/Data mining method called K-means which makes clusters of given data. It can be used with added functionality in the end to fit your purpose.
The basic algorithm for K-Means is:
The added functionality would then be:
Sources:
K-means algorithm - Linköping University
Quadtree - en.wikipedia.org/wiki/Quadtree
A quick search on wikipedia and you find source code: en.wikipedia.org/wiki/K-means_clustering
If it is true that precision is not important and algorithm may do small mistakes then I think the following.
Let
f(x,y)
is a function which has a maximum at the point (0,0) and is only significant at the points inside of circle of radius R. For example,f(x,y) = e^{(x^2 + y^2)/ (2 * R^2)}
.Let
(x_i,y_i)
are points andE_i(x,y) = f(x - x_i, y - y_i)
.Your problem is to find the maximum of
\sum_i E_i(x,y)
.You can use a gradient descent starting it from each point.
Edited to better wording, as suggested :
Basic observations :
The algorithm is then:
Here are two ideas: an O(n) approximation algorithm and an O(n^2 log n) exact (non-approximate) algorithm:
Fast approximation
Use locality-sensitive hashing. Basically, hash each point to a hash bucket that contains all nearby points. The buckets are set up so that collisions only happen between nearby points - unlike similarly-named hash tables, collisions are useful here. Keep a running count of the number of collisions in a bucket, and then use the center of that bucket as the center of your circle.
I admit that this is a fast explanation of a concept that is not super-obvious the first time you hear of it. An analogy would be to ask a bunch of people what their zip code is, and use the most common zip code to determine the most populous circle. It's not perfect, but a good fast heuristic.
It's basically linear time in terms of the number of points, and you can update your data set on the fly to incrementally get new answers in constant time per point (constant with respect to n = number of points).
More about locality-sensitive hashes in general or about my personal favorite version that would work in this case.
Better-than-brute-force deterministic approach
The idea is: for each point, place the edge of a circle on that point and sweep the circle around to see which direction contains the most points. Do this for all points and we find the global max.
The sweep around a point p can be done in n log n time by: (a) finding the angle interval per other point q such that when we place the circle at angle theta, then it contains q; and (b) sorting the intervals so that we can march/sweep around p in linear time.
So it takes O(n log n) time to find the best circle touching a fixed point p, and then multiply that by O(n) to perform the check for all points. The total time is O(n^2 log n).