The Issue
Given the following network of nodes and edges, I would like to derive all possible groupings of nodes where all nodes within a group are connected to all other nodes within that group via an edge.
In this network...
- nodes 'B', 'C', and 'F' would be in a group as they are fully interconnected
- 'A' would only belong in a group with itself.
- 'D' and 'B' would be in a group together, but 'D' would not belong in the group with 'B', 'C', and 'F' because it is not connected directly to 'C' and 'F' via an edge.
In other words, the rules are as follows:
All members of a group must be connected to all other members of that group directly via an edge.
An object may be a member of multiple groups.
No redundant groups. If a group can fit within a larger group, it is not a group. (Ex. 'B' and 'C' do not comprise a valid group on their own because they both fit within the larger group of 'B', 'C', and 'F'). An object may only be in a singular group (ex. A-A) if it belongs to no other groups.
I have represented the network above as a dataframe where each row represent pairs of nodes (x1 and x2) bound by an edge:
x1 <- c("A", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "F", "F", "F")
x2 <- c("A", "B", "C", "D", "F", "B", "C", "F", "B", "D", "E", "D", "E", "B", "C", "F")
df <- data.frame(x1, x2)
Given this df, I would like to derive the following valid groups (provided in visual as well as data frame form):
1 2 3 4
1 A B B D
2 NULL C D E
3 NULL F NULL NULL
**Note: the order of groups/group names is irrelevant.
What I have tried
I have attempted to loop through a list of each unique node name in column x1 of df to identify all nodes that each node is connected to. I then use this information to generate group rosters. However, these group rosters are sometimes invalidated by violating rule 1. Here is what I have thus far...
n <- nrow(as.data.frame(unique(df$x1)))
RosterGuide <- as.data.frame(matrix(nrow = n , ncol = 1))
RosterGuide$V1 <- seq.int(nrow(RosterGuide))
RosterGuide$Object <- (unique(df$x1))
colnames(RosterGuide) <- c("V1","Object")
groups_frame <- matrix(, ncol= length(n), nrow = length(n))
for (loopItem in 1:nrow(RosterGuide)) {
object <- subset(RosterGuide$Object, RosterGuide$V1 == loopItem)
group <- as.data.frame(subset(df$x2, df$x1 == object))
groups_frame <- cbind.fill(group, groups_frame, fill = "NULL")
}
Groups <- as.data.frame(groups_frame)
Groups <- subset(Groups, select = - c(object))
colnames(Groups) <- RosterGuide$V1
... this loop yields the data frame 'Groups'...
1 2 3 4 5 6
1 B D B B B A
2 C E D C C NULL
3 F NULL E F D NULL
4 NULL NULL NULL NULL F NULL
This is where I am at. You can see that group 3 violates the first rule because 'B' and 'E' are not directly connected by an edge, group 5 violates the first rule because 'F' and 'D' and 'F' and 'C' are not directly connected via an edge, and group 4 violates the third rule because it is a duplication of group 1 (I am less worried about 3rd rule violations, I can solve that one easily).
I am at a loss in trying to get from the data frame 'Groups' to the valid output I suggested above in a way that is universal to any data frame like df (2 columns, infinite rows) that describes the nodes and edges of a network of any size.