In my program, I have a set of points. For purposes of rescaling, I am searching for the two nodes that are furthest apart, and then computing a factor by which to multiply all coordinates so that the maximum distance is equal to some predefined one I define.
The algorithm I am using to find the two points furthest apart is, however, problematic for large sets of points, as it is O(n^2)
; pseudocode (distances that were already calculated are skipped):
for each point in points:
for each other point in points:
if distance between point and other point > max
max = distance between point and other point
Is there something faster?
If you perform this query often but the points do not change much, you can perform precalculations that can speed things up.
Each point can store the farthest point from it and recheck on every point addition if the new point is farther.
When you query you just go thru all the points and look at their cached points.
You end up with O(n) for new point entry and O(n) for farthest apart query.
If you just need the scale and not the exact points you can do this in O(n) time with some error margin. Think about the simple case of making a bounding box. Calculate the minimum x value from all the points, the maximum x, the minimum y and the maximum y. These four numbers give you a maximum bounding box around your points with max error of 1 - (1/sqrt(2)) about 30%. You can reduce this by adding more sides to your square. Think about the case of an octagon. To calculate the min and max values for the other sides you have to rotate your coordinate system.
Error vs run time breaks down like this.
shape - run time - max error
Here's the equation for the max error I came up with.
Let me know if I should add a diagram explaining this.
As mentioned in this answer, you are seeking the "diameter" of the set of N points, a well known problem in computational geometry. There are basically two steps:
Find the convex hull of the points. Algorithms exist that are O(N ln N), worst case. In practice, QuickHull is usually a fast choice, although potentially O(N^2) worst case. The QHull implementation is convenient to call from the command line. The CGAL library provides a C++ implementation
Antipodal pairs on the convex hull are candidates for furthest points. One can search over the antipodal points using an algorithm like Rotating calipers in O(N) time.
The problem can be generalized to an "all-farthest pairs" problem: for each point
i
, find the most distant pointj
---we're now seeking N pairs of points. The solution again uses the convex hull, but now the second part can be done with a matrix searching algorithm.The distance from
A
toB
is the same as the distance fromB
toA
. You can easily modify the algorithm to eliminate half of the computations this way. It'll still beO(n^2)
but will be twice as fast.That is, instead of computing all off-diagonal elements of the distance matrix
P x P
:you can compute either the upper triangle:
or the lower triangle:
I'm not sure if putting the points into a spatial index and querying it leads to an O(n log n) algorithm.
See John Feminella's excellent answer to this similar question. You should get O(n log n) for the average case.