Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.
size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
size_t numPointsLabeled = 0;
//alias for points to avoid retyping
vector<Point3d> & points = _img.points;
deque<size_t> ptQueue;
ptQueue.push_back(seed);
points[seed].setLabel(segNumber);
while (!ptQueue.empty()) {
size_t currentIdx = ptQueue.front();
ptQueue.pop_front();
points[currentIdx].setLabel(segNumber);
numPointsLabeled++;
vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
for (int i = 0; i < (int)newPoints.size(); i++) {
int newIdx = newPoints[i];
Point3d &newPoint = points[newIdx];
if(!newPoint.labeled()) {
newPoint.setLabel(segNumber);
ptQueue.push_back(newIdx);
}
}
}
//NOTE to whoever wrote the other code, the compiler optimizes i++
//to ++i in cases like these, so please don't change them just for speed :)
for (size_t i = seed; i < points.size(); i++) {
if(!points[i].labeled()) {
//search for an unlabeled point to serve as the next seed.
seed = i;
return numPointsLabeled;
}
}
return numPointsLabeled;
}
Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:
vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
ANNidxArray nnIdx = new ANNidx[k];
kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
vector<int> outPoints;
outPoints.reserve(k);
for(int i = 0; i < k; i++) {
outPoints.push_back(nnIdx[i]);
}
delete[] nnIdx;
return outPoints;
}
My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).
In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.
So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?
Thanks for any help you may be able to provide
EDIT I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.
Is there any other ideas people may have? I'm honestly stumped right now.