Find all tuples related to a certain string in Pyt

2020-03-29 09:19发布

问题:

I am trying to find all tuples related to a string, not just matched to it. Here is what I made:

from itertools import chain

data = [('A','B'),('B','C'),('B','D'),('B','F'),('F','W'),('W','H'),('G','Z')]
init = 'A'

filtered_init = [item for item in data if item[0] == init or item[1] == init]
elements = list(dict.fromkeys([ i for i in chain(*filtered_init)]))
elements.remove(init)

dat = []
for i in elements:
    sync = [item for item in data if item[0] == i or item[1] == i]
    dat.append(sync)

print(dat)

The result is:

[('A', 'B'), ('B', 'C'), ('B', 'D'), ('B', 'F')]

However, it only contains A-B-related level. What I want to find is all tuples related to init string as described in the picture below:

In other words, [('A','B'),('B','C'),('B','D'),('B','F'),('F','W'),('W','H')] It is to find all edges reachable to init. How can I get them?

回答1:

Your problem is to find the connected component of init in an undirected graph defined by an edge list data structure.

This data structure is not very convenient to use for this problem, so the first step is to transform it into an adjacency list. From there, we can apply any standard graph traversal algorithm, such as depth first search. Once we're done, we can transform the result back into the edge list format you want for your output.

from collections import defaultdict

def find_connected_component(edge_list, start):
    # convert to adjacency list
    edges = defaultdict(list)
    for a, b in edge_list:
        edges[a].append(b)
        edges[b].append(a)

    # depth-first search
    stack = [start]
    seen = set()

    while stack:
        node = stack.pop()
        if node not in seen:
            seen.add(node)
            stack.extend(edges[node])

    # convert back to edge list
    return [ edge for edge in edge_list if edge[0] in seen ]

Usage:

>>> find_connected_component(data, init)
[('A', 'B'), ('B', 'C'), ('B', 'D'), ('B', 'F'), ('F', 'W'), ('W', 'H')]


回答2:

For more efficient, you might use DSU. This solution works O(N)

from functools import reduce
import random

parent = dict()
init = 'A'
data = [('A','B'),('B','C'),('B','D'),('B','F'),('F','W'),('W','H'),('G','Z')]

def make_set(v):
    parent[v] = v

def find_set(v):
    if v == parent[v]:
        return v
    parent[v] = find_set(parent[v])
    return parent[v]

def union_sets(a, b):
    a, b = map(find_set, [a, b])
    if a != b:
        if random.randint(0, 1):
            a, b = b, a
        parent[b] = a;

elements = set(reduce(lambda x, y: x+y, data))

for v in elements:
    parent[v] = v

for u, v in data:
    union_sets(u, v)

init_set = find_set(init)
edges_in_answer = [e for e in data if find_set(e[0]) == init_set]
print(edges_in_answer)

Output: [('A', 'B'), ('B', 'C'), ('B', 'D'), ('B', 'F'), ('F', 'W'), ('W', 'H')]



回答3:

A very naive solution, might not be efficient for complicated trees.

data = [('A', 'B'), ('B', 'C'), ('B', 'D'), ('B', 'F'),
        ('F', 'W'), ('W', 'H'), ('G', 'Z')]
init = ['A']
result = []
while init:
    initNEW = init.copy()
    init = []
    new = 0
    for edge in data:
        for vertex in initNEW:
            if edge[0] == vertex:
                result.append(edge)
                init.append(edge[1])
                new += 1
    for i in range(len(result) - new, len(result)):
        data.remove(result[i])
print(result)
# [('A', 'B'), ('B', 'C'), ('B', 'D'), ('B', 'F'), ('F', 'W'), ('W', 'H')]