A fast way to find connected component in a 1-NN g

2019-04-08 23:06发布

问题:

First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:

0 -> 1  
1 -> 2  
2 -> 3  
3 -> 2  
4 -> 2  
5 -> 6  
6 -> 7  
7 -> 6  
8 -> 6  
9 -> 8

the second column was the nearest neighbor's index. So this was a special kind of directed graph, with each vertex had and only had one out-degree.

Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.

But, given the characteristic of this special graph, is there any other fast way to do the job ?

I will be really appreciated.

Update:

I've implemented a simple algorithm for this case here.

Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).

You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.

回答1:

The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.

This will definitely run in linear time:

- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
    and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).

Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:

for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop, 
   collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets

The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.

You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.



回答2:

Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.

In your example:

0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]

You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.



回答3:

If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link . Here is the pseudocode .honoring @pycho 's words

`

 public class QuickUnion
    {
    private int[] id;

    public QuickUnion(int N)
    {
         id = new int[N];
         for (int i = 0; i < N; i++) id[i] = i;
    }

    public int root(int i)
    {
         while (i != id[i])
         {
             id[i] = id[id[i]];
             i = id[i];
         }
         return i;
    }

    public boolean find(int p, int q)
    {
        return root(p) == root(q);
    }

    public void unite(int p, int q)
    {
        int i = root(p);
        int j = root(q);
        id[i] = j;
    }

    }

` @reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link

https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/