First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:
0 -> 1 1 -> 2 2 -> 3 3 -> 2 4 -> 2 5 -> 6 6 -> 7 7 -> 6 8 -> 6 9 -> 8
the second column was the nearest neighbor's index. So this was a special kind of directed graph, with each vertex had and only had one out-degree.
Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.
But, given the characteristic of this special graph, is there any other fast way to do the job ?
I will be really appreciated.
Update:
I've implemented a simple algorithm for this case here.
Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).
You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.
Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.
In your example:
You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.
If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link . Here is the pseudocode .honoring @pycho 's words
`
` @reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link
https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/
The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.
This will definitely run in linear time:
Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:
The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.
You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.