Merging sets with common elements? [duplicate]

2019-05-22 23:11发布

问题:

This question already has an answer here:

  • Merging tuples if they have one common element 5 answers

I would like to merge sets with common elements. For example

input = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]),  frozenset([1,1000]),
             frozenset([100, 200]), frozenset([100, 300, 400])])

Result:

set([frozenset([1,2,3,4,5,6,7,8, 1000]), frozenset([100,200,300,400])])

What would be an efficient way to achieve this?

回答1:

Use the built-in set.union() -> only include results where the length of set.intersection() is greater than 0.


This answer explains at a high level how to implement what you want. If you want a specific solution, you'll have to show your efforts to solve the problem first.



回答2:

Here are what I tried. Firstly, I did something simple but wrong

result = set()
while input:                                                                   
    r = set(input.pop())
    for rr in input.copy():
        if rr & r:                                                  
            input.remove(rr)
            r.update(rr)  
    result.add(frozenset(r))   

The problem with this code is that the merging may not be complete, depending on the order of the input.pop(). Suppose input={frozenset([0, 1]), frozenset([2,3]), frozenset([1,2])} and the three elements get popped out and looped over in this assignment order, then results={frozenset([0,1,2]), frozenset([2,3])}.

Then I implemented the answer in this post. It first builds an adjacency list of a new graph, each node corresponds to one frozenset element of the input. An edge exists between two nodes (frozensets) if they share common elements. Then depth first graph traversal is used to find the connected components of this new graph.

result, visited = set(), set()
components = collections.defaultdict(list)
adj_list = collections.defaultdict(list)

def dft(node, key):
    visited.add(node)
    components[key].append(node)
    for neighbor in adj_list[node]:
        if neighbor not in visited:
            dft(neighbor, key)

for r1, r2 in itertools.combinations_with_replacement(input, 2):
    if r1 & r2:
        adj_list[r1].append(r2)
        adj_list[r2].append(r1)
for node in adj_list:
    if node not in visited:
        dft(node, node)

for node, neighbors in components.iteritems():
    result.add(node.union(*neighbors))


标签: python set