To a list of N
points [(x_1,y_1), (x_2,y_2), ... ]
I am trying to find the nearest neighbours to each point based on distance. My dataset is too large to use a brute force approach so a KDtree seems best.
Rather than implement one from scratch I see that sklearn.neighbors.KDTree
can find the nearest neighbours. Can this be used to find the nearest neighbours of each particle, i.e return a dim(N)
list?
This question is very broad and missing details. It's unclear what you did try, how your data looks like and what a nearest-neighbor is (identity?).
Assuming you are not interested in the identity (with distance 0), you can query the two nearest-neighbors and drop the first column. This is probably the easiest approach here.
Code:
Output
You can use
sklearn.neighbors.KDTree
'squery_radius()
method, which returns a list of the indices of the nearest neighbours within some radius (as opposed to returning k nearest neighbours).Outputs:
Note that each point includes itself in its list of nearest neighbours within the given radius. If you want to remove these identity points, the line computing
all_nns
can be changed to:Resulting in:
The sklearn should be the best. I wrote the below some time back ,where I needed custom distance. (I guess sklearn does not support custom distance fn 'KD tree' with custom distance metric . Adding for reference
Adapted from my gist for 2D https://gist.github.com/alexcpn/1f187f2114976e748f4d3ad38dea17e8