Imagine we have many points which some of them are connected together and we want to cluster them.
Please take a look at the following figure.
If we have the "connectivity matrix" of points, how we can cluster them in two group (groups of connected points)?
ConnectivityMatrix=
[1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5]
The final result should be nodes of 1,2,3,4
in a first group(cluster) and nodes of 5,6,7,8
in a second group (cluster).
Here is some code to get you started. I basically implemented Depth First Search... a very crude implementation of it anyway.
Depth First Search is an algorithm that is used for the traversal of trees. Graphs are essentially a special case of trees where there are leaf nodes that connect back to the root. The basic algorithm for Depth First Search is as so:
- Start at the root of the tree and add this to a stack
- For each node that is connected to the root, add this onto the stack and place this in a list of visited nodes
- While there is still a node on the stack...
- Pop off a node off the stack
- Check the visited nodes list. If this is a node we have already visited, skip.
- Else, visit any nodes that are connected to this node we popped off and add to the stack
If we have disconnected graphs like what you have above, we basically run Depth First Search multiple times. Each time will be for one cluster. After one Depth First Search result, we will discover nodes that belong to one cluster. We restart Depth First Search again with any node that we have not touched yet, which will be a node from another cluster that we have not visited. As you clearly have two clusters in your graph structure, we will have to run Depth First Search two times. This is commonly referred to as finding all connected components in an overall graph.
To find the Connected Components, here are the steps that I did:
- Create a connectivity matrix
- Initialize a Boolean list that tells us whether or not we have visited a node in your graph
- Initialize an empty cluster list
- Initialize an empty stack that contains which nodes we should visit.
- While there is at least one node we need to visit...
- Find such a node
- Initialize our stack to contain this node
- While our stack is not empty
- Pop off a node from the stack
- If we have visited this node, continue
- Else mark as visited
- Retrieve all nodes connected to this node
- Remove those nodes that are not on the stack in (4)
- Add these nodes to the stack and the cluster list
- Once the stack is empty, we have a list of all of the nodes contained in a single cluster. Add this cluster to a final list.
- Repeat 1 - 6 until we visit all nodes
Without further ado, this is the code. Bear in mind this is not battle tested. If you have graph structures that will generate an error, that'll be on your own to fix :)
ConnectivityMatrix = [1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5];
%// Find all possible node IDs
nodeIDs = unique(ConnectivityMatrix(:));
%// Flag that tells us if there are any nodes we should visit
nodeIDList = false(1,numel(nodeIDs));
%// Stores our list of clusters
clusterList = {};
%// Keeps track of how many clusters we have
counter = 1;
%// Stack - initialize to empty
stackNodes = [];
%// While there is at least one node we need to visit
while (~all(nodeIDList))
% Find any node
stackNodes = find(nodeIDList == false, 1);
% Initialize our stack to contain this node
nodesCluster = stackNodes;
%// While our stack is not empty
while (~isempty(stackNodes))
% Grab the node off the stack and pop off
node = stackNodes(end);
stackNodes(end) = [];
%// If we have marked this node as visited, skip
if (nodeIDList(node))
continue;
end
%// Mark as visited
nodeIDList(node) = true;
%// Retrieve all nodes connected to this node
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
%// Remove those already visited
visitedNodes = ~nodeIDList(nodesToVisit);
finalNodesToVisit = nodesToVisit(visitedNodes);
%// Add to cluster
nodesCluster = unique([nodesCluster finalNodesToVisit]);
%// Add to stack
stackNodes = unique([stackNodes finalNodesToVisit]);
end
%// Add connected components to its own cluster
clusterList{counter} = nodesCluster;
counter = counter + 1;
end
Once we have run this code, we can display our clusters like so:
celldisp(clusterList)
clusterList{1} =
1 2 3 4
clusterList{2} =
5 6 7 8
As such, cluster #1 contains nodes 1,2,3,4 while cluster #2 contains nodes 5,6,7,8.
Bear in mind that this code will only work if you sequentially label your nodes as you did in your diagram. You can't skip any label numbers (i.e. you can't do 1,2,4,6,9, etc. This should be 1,2,3,4,5).
Good luck!
You can use "off-the-shelf" Matlab commands for this problem. For example, you can use graphconncomp
.
The answer from rayryeng is pretty good. However, here are some details I would like to point out:
- "
numel(nodeIDs)
" would bring in possible errors if your node label is not sequentially (i.e., you have 10 nodes and the maximum node label is 20). You could switch "numel(nodeIDs)
" to "max(nodeIDs)
or a larger value."
I also believe the following code would bring in some problems (i.e., some nodes are missing and become isolated nodes) when utilizing this function:
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
I modified the simple two lines with the following messy code:
connectedNodes1 = ConnectivityMatrix (ConnectivityMatrix (:,1) == node, :);
connectedNodes2 = ConnectivityMatrix (ConnectivityMatrix (:,2) == node, :);
AC=connectedNodes1(:,2).';
AD=connectedNodes2(:,1).';
ACA=reshape(AC,[],1);
ADA=reshape(AD,[],1);
AE= [ACA; ADA] ;
AEA=reshape(AE,[],1);
AEA=AEA';
nodesToVisit = unique(AEA);
After modifying these two points, there are no problems with rayryeng's initial code.