可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.

size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
    size_t numPointsLabeled = 0;

    //alias for points to avoid retyping
    vector<Point3d> & points = _img.points;
    deque<size_t> ptQueue;
    ptQueue.push_back(seed);
    points[seed].setLabel(segNumber);
    while (!ptQueue.empty()) {
        size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        points[currentIdx].setLabel(segNumber);
        numPointsLabeled++;
        vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
        for (int i = 0; i < (int)newPoints.size(); i++) {
            int newIdx = newPoints[i];
            Point3d &newPoint = points[newIdx];
            if(!newPoint.labeled()) {
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //NOTE to whoever wrote the other code, the compiler optimizes i++ 
    //to ++i in cases like these, so please don't change them just for speed :)
    for (size_t i = seed; i < points.size(); i++) {
        if(!points[i].labeled()) {
            //search for an unlabeled point to serve as the next seed.
            seed = i;
            return numPointsLabeled;
        }
    }
    return numPointsLabeled;
}

Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:

vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
    int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
    ANNidxArray nnIdx = new ANNidx[k];
    kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
    vector<int> outPoints;
    outPoints.reserve(k);
    for(int i = 0; i < k; i++) {
        outPoints.push_back(nnIdx[i]);
    }
    delete[] nnIdx;
    return outPoints;
}

My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).

In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.

So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?

Thanks for any help you may be able to provide

EDIT I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.

Is there any other ideas people may have? I'm honestly stumped right now.

回答1:

I propose the following algorithm:

Compute 3D Delaunay triangulation of your data points.
Remove all the edges that are longer than your threshold distance, O(N) when combined with step 3.
Find connected components in the resulting graph which is O(N) in size, this is done in O(N α(N)).

The bottleneck is step 1 which can be done in O(N²) or even O(N log N) according to this page http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html. However it's definitely not a 100 lines algorithm.

回答2:

When I did something along these lines, I chose an "origin" outside of the dataset somewhere and sorted all of the points by their distance to that origin. Then I had a much smaller set of points to choose from at each step, and I only had to go through the "onion skin" region around the point being considered. You would check neighboring points until the distance to the closest point is less than the width of the range you're checking.

While that worked well for me, a similar version of that can be achieved by sorting all of your points along one axis (which would represent the "origin" being infinitely far away) and then just checking points again until your "search width" exceeds the distance to the closest point so far found.

回答3:

Points should be better organized. To search more efficiently instead of a vector<Point3d> you need some sort of a hash map where hash collision implies that two points are close to each other (so you use hash collisions to your advantage). You can for instance divide the space into cubes of size equal to SEGMENT_MAX_DISTANCE, and use a hash function that returns a triplet of ints instead of just an int, where each part of a triplet is calculated as point.<corresponding_dimension> / SEGMENT_MAX_DISTANCE.

Now for each point in this new set you search only for points in the same cube, and in adjacent cubes of space. This greatly reduces the search space.