This question already has an answer here:
-
Merging tuples if they have one common element
5 answers
I would like to merge sets with common elements. For example
input = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([1,1000]),
frozenset([100, 200]), frozenset([100, 300, 400])])
Result:
set([frozenset([1,2,3,4,5,6,7,8, 1000]), frozenset([100,200,300,400])])
What would be an efficient way to achieve this?
Use the built-in set.union() -> only include results where the length of set.intersection() is greater than 0.
This answer explains at a high level how to implement what you want. If you want a specific solution, you'll have to show your efforts to solve the problem first.
Here are what I tried. Firstly, I did something simple but wrong
result = set()
while input:
r = set(input.pop())
for rr in input.copy():
if rr & r:
input.remove(rr)
r.update(rr)
result.add(frozenset(r))
The problem with this code is that the merging may not be complete, depending on the order of the input.pop()
. Suppose input={frozenset([0, 1]), frozenset([2,3]), frozenset([1,2])}
and the three elements get popped out and looped over in this assignment order, then results={frozenset([0,1,2]), frozenset([2,3])}
.
Then I implemented the answer in this post. It first builds an adjacency list of a new graph, each node corresponds to one frozenset
element of the input
. An edge exists between two nodes (frozenset
s) if they share common elements. Then depth first graph traversal is used to find the connected components of this new graph.
result, visited = set(), set()
components = collections.defaultdict(list)
adj_list = collections.defaultdict(list)
def dft(node, key):
visited.add(node)
components[key].append(node)
for neighbor in adj_list[node]:
if neighbor not in visited:
dft(neighbor, key)
for r1, r2 in itertools.combinations_with_replacement(input, 2):
if r1 & r2:
adj_list[r1].append(r2)
adj_list[r2].append(r1)
for node in adj_list:
if node not in visited:
dft(node, node)
for node, neighbors in components.iteritems():
result.add(node.union(*neighbors))