I have a point cloud C, where each point has an associated value. Lets say the points are in 2-d space, so each point can be represented with the triplet (x, y, v).
I'd like to find the subset of points which are local maxima. That is, for some radius R, I would like to find the subset of points S in C such that for any point Pi (with value vi) in S, there is no point Pj in C within R distance of Pi whose value vj is greater that vi.
I see how I could do this in O(N^2) time, but that seems wasteful. Is there an efficient way to do this?
Side Notes:
- The source of this problem is that I'm trying to find local maxima in a sparse matrix, so in my case x, y are ordered integer indeces - if this simplifies the problem let me know!
- I'm perfectly happy if the solution is just for a manhattan distance or whatever.
- I'm in python, so if there's some kind of nice vectorized numpy way to do this that's just great.
Use a 2D-tree (2D instance of a kD-tree). After N.Log(N) time preprocessing, It will allow you to perform fixed-radius near-neighbor searches around all your points in about Log(N) + K time (K neighbors found on average), for a total of N.Log(N)+ K.N. It will perfectly live with the Manhattan distance.
Following up on Yves' suggestion, here's an answer, which uses scipy's KDTree:
from scipy.spatial.kdtree import KDTree
import numpy as np
def locally_extreme_points(coords, data, neighbourhood, lookfor = 'max', p_norm = 2.):
'''
Find local maxima of points in a pointcloud. Ties result in both points passing through the filter.
Not to be used for high-dimensional data. It will be slow.
coords: A shape (n_points, n_dims) array of point locations
data: A shape (n_points, ) vector of point values
neighbourhood: The (scalar) size of the neighbourhood in which to search.
lookfor: Either 'max', or 'min', depending on whether you want local maxima or minima
p_norm: The p-norm to use for measuring distance (e.g. 1=Manhattan, 2=Euclidian)
returns
filtered_coords: The coordinates of locally extreme points
filtered_data: The values of these points
'''
assert coords.shape[0] == data.shape[0], 'You must have one coordinate per data point'
extreme_fcn = {'min': np.min, 'max': np.max}[lookfor]
kdtree = KDTree(coords)
neighbours = kdtree.query_ball_tree(kdtree, r=neighbourhood, p = p_norm)
i_am_extreme = [data[i]==extreme_fcn(data[n]) for i, n in enumerate(neighbours)]
extrema, = np.nonzero(i_am_extreme) # This line just saves time on indexing
return coords[extrema], data[extrema]
I found this solution, but it's probably O(N^2):
import numpy as np
# generate test data
n = 10
foo = np.random.rand(n,n)
# fixed test data for visual back-checking
# foo = np.array([[ 0.12439309, 0.88878825, 0.21675684, 0.21422532, 0.7016789 ],
# [ 0.14486462, 0.40642871, 0.4898418 , 0.41611303, 0.12764404],
# [ 0.41853585, 0.22216484, 0.36113181, 0.5708699 , 0.3874901 ],
# [ 0.24314391, 0.22488507, 0.22054467, 0.25387521, 0.46272496],
# [ 0.99097341, 0.76083447, 0.37941783, 0.932519 , 0.9668254 ]])
# list to collect local maxima
local_maxima = []
# distance in x / y to define region of interest around current center coordinate
# roi = 1 corresponds to a region of interest of 3x3 (except at borders)
roi = 1
# give pseudo-coordinates
x,y = np.meshgrid(range(foo.shape[0]), range(foo.shape[1]))
for i in range(foo.shape[0]):
for j in range(foo.shape[1]):
x0 = x[i,j]
y0 = y[i,j]
z0 = foo[i,j]
# index calculation to avoid out-of-bounds error when taking sub-matrix
mask_x = abs(x - x0) <= roi
mask_y = abs(y - y0) <= roi
mask = mask_x & mask_y
if np.max(foo[mask]) == z0:
local_maxima.append((i, j))
print local_maxima
It's all about defining sliding windows/filters over your matrix. All other solutions coming to my mind are rather pointing to absolute maxima (like e.g. histogramming)...
However I hope my ansatz is useful to some extent...
EDIT:
here another solution which should be faster than the first, but still O(N^2), and it does not depend on rectilinear gridded data:
import numpy as np
# generate test data
# points = np.random.rand(10,3)
points = np.array([[ 0.08198248, 0.25999721, 0.07041999],
[ 0.19091977, 0.05404123, 0.25826508],
[ 0.8842875 , 0.90132467, 0.50512316],
[ 0.33320528, 0.74069399, 0.36643752],
[ 0.27789568, 0.14381512, 0.13405309],
[ 0.73586202, 0.4406952 , 0.52345838],
[ 0.76639731, 0.70796547, 0.70692905],
[ 0.09164532, 0.53234394, 0.88298593],
[ 0.96164975, 0.60700481, 0.22605181],
[ 0.53892635, 0.95173308, 0.22371167]])
# list to collect local maxima
local_maxima = []
# distance in x / y to define region of interest around current center coordinate
radius = 0.25
for i in range(points.shape[0]):
# radial mask with radius radius, could be beautified via numpy.linalg
mask = np.sqrt((points[:,0] - points[i,0])**2 + (points[:,1] - points[i,1])**2) <= radius
# if current z value equals z_max in current region of interest, append to result list
if points[i,2] == np.max(points[mask], axis = 0)[2]:
local_maxima.append(tuple(points[i]))
Result:
local_maxima = [
(0.19091976999999999, 0.054041230000000003, 0.25826507999999998),
(0.33320527999999999, 0.74069399000000002, 0.36643752000000002),
(0.73586202000000001, 0.44069520000000001, 0.52345838),
(0.76639731, 0.70796546999999999, 0.70692904999999995),
(0.091645320000000002, 0.53234393999999996, 0.88298593000000003),
(0.53892635, 0.95173308000000001, 0.22371167)
]