Algorithm to cover maximal number of points with o

2020-02-16 12:27发布

站内文章 / C++

100 0

该账号已被封号

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Let's imagine we have a plane with some points on it. We also have a circle of given radius.

I need an algorithm that determines such position of the circle that it covers maximal possible number of points. Of course, there are many such positions, so the algorithm should return one of them.
Precision is not important and the algorithm may do small mistakes.

Here is an example picture:

Input:
  int n (n<=50) – number of points;
  float x[n] and float y[n] – arrays with points' X and Y coordinates;
  float r – radius of the circle.

Output:
float cx and float cy – coordinates of the circle's center

Lightning speed of the algorithm is not required, but it shouldn't be too slow (because I know a few slow solutions for this situation).

C++ code is preferred, but not obligatory.

回答1:

Edited to better wording, as suggested :

Basic observations :

I assume the radius is one, since it doesn't change anything.
given any two points, there exists at most two unit circles on which they lie.
given a solution circle to your problem, you can move it until it contains two points of your set while keeping the same number of points of your set inside it.

The algorithm is then:

For each pair of points, if their distance is < 2, compute the two unit circles C1 and C2 that pass through them.
Compute the number of points of your set inside C1 and C2
Take the max.

回答2:

This is the "disk partial covering problem" in the literature -- that should give you a good place to start googling. Here's a paper covering one possible solution, but it is a little intense mathematically: Approximation Algorithms Design for Disk Partial Covering Problem

As a matter of fact, this falls in the area called computational geometry, which is fascinating but can be hard to get a toehold in. There's a good overview by deBerg on various algorithms related to the subject.

回答3:

If you want something simple, take random position (x,y), calculate number of points inside circle and compare with previous position. Take the maximum. Repeat the operation any times you want.

Why the hell downvote? Ever heard about Monte Carlo methods? Actually for a huge amount of points, deterministic algorithm may not finish in reasonable time.

回答4:

The problem reverts back to finding the global optimum of a function f :R x R -> N. The input for f is the center point of the circle, and the value, of course, is the number of included points from the set. The graph of the function will be discontinuous, staircase-like.

You could start by testing the function in each point corresponding to a point in the set, order the points by decreasing values of f, then intensify search around those points (for example moving out along a spiral).

Another option is to consider all intersection points of segments connecting any pairs of points in the set. Your optimal point will be at one of these intersections I think, but their number is probably too big to consider.

You may also hybridise options 1 and 2, and consider intersections of the segments generated from points around 'good set points'.

Anyhow, unless the number of set points is low (allowing you to check all intersections), I don't think you can guarantee to find the optimum (just a good solution).

回答5:

At a first glance, I would say a quad tree solution.

Also, there is an information visualization/Data mining method called K-means which makes clusters of given data. It can be used with added functionality in the end to fit your purpose.

The basic algorithm for K-Means is:

Place K points into the space represented by the items that are being clustered - These points represent initial group centroids
Assign each data item to the group that has the closest centroid
When all items have been assigned, recalculate the positions of the K centroids by calculating the mean position of your dots
Repeat Steps 2 and 3 until the centroids no longer move, or move very little

The added functionality would then be:

Calculate number of points within your circle positioned at the K centroids
Choose the one that suits you the best ;)

Sources:
K-means algorithm - Linköping University
Quadtree - en.wikipedia.org/wiki/Quadtree

A quick search on wikipedia and you find source code: en.wikipedia.org/wiki/K-means_clustering

回答6:

Here are two ideas: an O(n) approximation algorithm and an O(n^2 log n) exact (non-approximate) algorithm:

Fast approximation

Use locality-sensitive hashing. Basically, hash each point to a hash bucket that contains all nearby points. The buckets are set up so that collisions only happen between nearby points - unlike similarly-named hash tables, collisions are useful here. Keep a running count of the number of collisions in a bucket, and then use the center of that bucket as the center of your circle.

I admit that this is a fast explanation of a concept that is not super-obvious the first time you hear of it. An analogy would be to ask a bunch of people what their zip code is, and use the most common zip code to determine the most populous circle. It's not perfect, but a good fast heuristic.

It's basically linear time in terms of the number of points, and you can update your data set on the fly to incrementally get new answers in constant time per point (constant with respect to n = number of points).

More about locality-sensitive hashes in general or about my personal favorite version that would work in this case.

Better-than-brute-force deterministic approach

The idea is: for each point, place the edge of a circle on that point and sweep the circle around to see which direction contains the most points. Do this for all points and we find the global max.

The sweep around a point p can be done in n log n time by: (a) finding the angle interval per other point q such that when we place the circle at angle theta, then it contains q; and (b) sorting the intervals so that we can march/sweep around p in linear time.

So it takes O(n log n) time to find the best circle touching a fixed point p, and then multiply that by O(n) to perform the check for all points. The total time is O(n^2 log n).

回答7:

If it is true that precision is not important and algorithm may do small mistakes then I think the following.

Let f(x,y) is a function which has a maximum at the point (0,0) and is only significant at the points inside of circle of radius R. For example, f(x,y) = e^{(x^2 + y^2)/ (2 * R^2)}.

Let (x_i,y_i) are points and E_i(x,y) = f(x - x_i, y - y_i).

Your problem is to find the maximum of \sum_i E_i(x,y) .

You can use a gradient descent starting it from each point.

回答8:

Might I suggest a density map? Find the min and max bounds for the x and y. Divide the range of the x and y bounds into bins having widths equal to the diameter of your circle. Count the number of points in each bin for the x and y separately. Now find the intersection on your density map between the highest ranked x bin with the highest ranked y bin.

This is a VERY fast algorithm to quickly generalize large data sets but it is not always accurate, to improve accuracy you can slice the bins into smaller and smaller pieces or shift the bin positions to the left or right n times and use a voting system to select the answer that occurs most often between trials.

回答9:

You could pixelize the whole area, then go to each point and increase the value of all pixels within the circle of the radius around that point. The pixels with the highest sum are good candidates.

Of course, you might lose some good areas or "hallucinate" good areas due to rounding errors. Perhaps you could try to do a rough pixellation first, then refine the promising areas.