So here's what I want to do: I have a list that contains several equivalence relations:
l = [[1, 2], [2, 3], [4, 5], [6, 7], [1, 7]]
And I want to union the sets that share one element. Here is a sample implementation:
def union(lis):
lis = [set(e) for e in lis]
res = []
while True:
for i in range(len(lis)):
a = lis[i]
if res == []:
res.append(a)
else:
pointer = 0
while pointer < len(res):
if a & res[pointer] != set([]) :
res[pointer] = res[pointer].union(a)
break
pointer +=1
if pointer == len(res):
res.append(a)
if res == lis:
break
lis,res = res,[]
return res
And it prints
[set([1, 2, 3, 6, 7]), set([4, 5])]
This does the right thing but is way too slow when the equivalence relations is too large. I looked up the descriptions on union-find algorithm: http://en.wikipedia.org/wiki/Disjoint-set_data_structure but I still having problem coding a Python implementation.
I think this is an elegant solution, using the built in set functions:
It returns a list of sets.
This works by completely exhausting one equivalence at a time. When an element finds it's equivalence it is removed from the original set and no longer searched.
OUTPUT: [set([1, 2, 3, 6, 7]), set([4, 5])]
Perhaps something like this?
Solution that runs in
O(n)
timeHow it works:
indices_dict
gives a map from an equivalence # to an index inlis
. E.g.1
is mapped to index0
and4
inlis
.disjoint_indices
gives a list of disjoint sets of indices. Each set corresponds to indices in an equivalence. E.g.lis[0]
andlis[3]
are in the same equivalence but notlis[2]
.disjoint_set
converts disjoint indices into into their proper equivalences.Time complexity
The
O(n)
time complexity is difficult to see but I'll try to explain. Here I will usen = len(lis)
.indices_dict
certainly runs inO(n)
time because only 1 for-loopdisjoint_indices
is the hardest to see. It certainly runs inO(len(d))
time since the outer loop stops whend
is empty and the inner loop removes an element ofd
each iteration. now, thelen(d) <= 2n
sinced
is a map from equivalence #'s to indices and there are at most2n
different equivalence #'s inlis
. Therefore, the function runs inO(n)
.disjoint_sets
is difficult to see because of the 3 combined for-loops. However, you'll notice that at mosti
can run over alln
indices inlis
andx
runs over the 2-tuple, so the total complexity is2n = O(n)