In Python, count unique key/value pairs in a dicti

2019-09-08 14:56发布

问题:

I have a dictionary that is made with a list of values. Some of these values are also keys or values in other key/value pairs in the dictionary. I would simply like to count how many of these unique pairs there are in the dictionary.

Ex. dict = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'bob':['cat','joe'],'hardy':['dog']}

I need to count the number of key/value pairs that do not have share a key/value with another in the dict. For example the above should count to only 2, those connected to dog and cat. Even though milo is unique to dog, dog is also in the key/value pair 'hardy' and both of these should therefore be counted together (ie, only 1). (See comments below) I have tried to go about it by replacing a key (key A) that exists in the values of another key (key B) with 'key B', without success however as I cannot specify key B correctly.

for keys, values in dict.iteritems():

    for key,value in dict.iteriterms():
            if key in values:
                dict[keys] = dict.pop(key)

Is there an easier method? Thanks in advance...

回答1:

If I understand the problem correctly, your dictionary is the adjacency map of a graph and you're trying to find the sets of connected components. The regular algorithm (using a depth- or breadth-first search) may not work correctly since your graph is not undirected (e.g. you have edges from "bob" and "cat" to "joe", but none coming out from "joe").

Instead, I suggest using a disjoint set data structure. It's not hard to build one using a dictionary to handle the mapping of values to parents. Here's an implementation I wrote for a previous question:

class DisjointSet:
    def __init__(self):
        self.parent = {}
        self.rank = {}

    def find(self, element):
        if element not in self.parent: # leader elements are not in `parent` dict
            return element
        leader = self.find(self.parent[element]) # search recursively
        self.parent[element] = leader # compress path by saving leader as parent
        return leader

    def union(self, leader1, leader2):
        rank1 = self.rank.get(leader1,0)
        rank2 = self.rank.get(leader2,0)

        if rank1 > rank2: # union by rank
            self.parent[leader2] = leader1
        elif rank2 > rank1:
            self.parent[leader1] = leader2
        else: # ranks are equal
            self.parent[leader2] = leader1 # favor leader1 arbitrarily
            self.rank[leader1] = rank1+1 # increment rank

And here's how you could use it to solve your problem:

djs = DisjointSet()
all_values = set()
for key, values in my_dict.items():
    all_values.add(key)
    all_values.update(values)
    for val in values:
        l1 = djs.find(key)
        l2 = djs.find(val)
        if l1 != l2:
            djs.union(l1, l2)

roots = {djs.find(x) for x in all_values}
print("The number of disjoint sets is:", len(roots))

The first part of this code does two things. First it builds a set with all the unique nodes found anywhere in the graph. Secondly, it combines the nodes into disjoint sets by doing a union wherever there's an edge.

The second step is to build up a set of "root" elements from the disjoint set.



回答2:

Here is one possible solution:

values = {'dog':['milo','otis','laurel','hardy'],
          'cat':['bob','joe'],
          'milo':['otis','laurel','hardy','dog'],
          'bob':['cat','joe'],
          'hardy':['dog']}

result = []

for x in values.iteritems():
    y = set([x[0]] + x[1])
    if not any([z for z in result if z.intersection(y)]):
        result.append(y)

print len(result)

Note that you shouldn't call a variable dict because you're shadowing the built-in type dict.

Your goal is unclear, but you can modify the construction of the y set to meet your needs.



回答3:

If I understand your question correctly, you are trying to describe a graph-like structure, and you're looking at whether the keys appear in a value list. Since you are only interested in count, you don't have to worry about future value lists, when iterating through the dict, so this should work:

d = {'dog': ['milo','otis','laurel','hardy'],'cat': ['bob','joe'],'milo': 'otis','laurel','hardy','dog'], 'bob': ['cat','joe'], 'hardy': ['dog']}
seen = set()
unique = []
for key, values in d.iteritems():
    if key not in seen:
        unique.append(key)
    seen = seen.union(values)
print(len(unique))

Note that the actual values contained in unique are dependent on dict ordering, are are only keys, not values. If you are actually trying to some sort of network or graph analysis, I suggest you make use of a library such as networkx