Yet another merging list of lists, but most python

I tried finding the answers of few selected answers, but they doesn't seem working. I have to merge list of lists.

[[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]

All I want is, for example, [283,311] exists in [311,283,316]. then both should be merged making only one list in the above list. I need to merge a list if it exists in some other.

Please note, I can do it using loop inside loop, but looking for a pythonic way to achieve this. Also, if you know how to merge lists sharing atleaast one common element, please share.

Edit :

Please don't forget, looking for a pythonic approach. I think it is not possible using any core Pythonic approach. Then what should be next possible solution using loops. I need efficiency as have to merge list of lists with more than a million items every half an hour. I did using for loop inside for loop then comparing two items, but it is taking too much of time.

标签： python list

3条回答

Anthone

2楼-- · 2020-07-27 03:55

I think this works ...

a = [[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]
aa = set(map(frozenset,a))  #eliminate duplicates
result =  [x for x in aa if not any(x < y for y in aa)]

Note that this gives you a list of frozenset objects, but you can easily turn that back into a list of lists:

result = [list(fset) for fset in result]

Unfortunately, it has quadratic complexity ... I'm not sure if there's anything to really be done about that ... (keeping it as lists would result in much worse complexity FWIW)

Here's a second approach which is a little better algorithmically:

from itertools import takewhile
def foo2(aa):
    aa = sorted(set(map(frozenset,a)),key=len)
    def conditional(x,aa):
        xlen = len(x)
        return any(x < y for y in takewhile(lambda z: xlen <= len(z),aa))
    return [x for x in aa if not conditional(x,aa)]

Of course, It would need to be timed on you real data to see if it actually performs better (for the test data, the lists aren't big enough and the overhead of generating the conditional function and the takewhile instance and the lambda make it not worthwhile.

for the curious, here's my benchmark:

a = [[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]

def foo1(aa):
    aa = set(map(frozenset,a))  #eliminate duplicates
    return [x for x in aa if not any(x < y for y in aa)]

from itertools import takewhile
def conditional(x,aa,takewhile=takewhile):
    xlen = len(x)
    return any(x < y for y in takewhile(lambda z: xlen <= len(z),aa))

def foo2(aa):
    aa = sorted(set(map(frozenset,a)),key=len)
    return [x for x in aa if not conditional(x,aa)]

print set(foo1(a)) == set(foo2(a))

import timeit
N = 10000
print timeit.timeit('foo1(a)','from __main__ import foo1,a',number=N)
print timeit.timeit('foo2(a)','from __main__ import foo2,a',number=N)

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2020-07-27 04:01

To answer your latter question; it's a connected-components problem.

>>> import networkx as nx
>>> G = nx.Graph()
>>> for component in l:
...:     G.add_edges_from(pairs(component))
...:     
>>> nx.connected_components(G)
# the answer

0人赞添加讨论(0) 举报

傲

4楼-- · 2020-07-27 04:06

Here's an alternative approach that I think should be faster than @mgilson's if you have a large number of small sets:

from collections import defaultdict

sets = set(map(frozenset, lists))

def remove_subsets(sets):
    # map each element to the sets in which it occurs
    sets_containing = defaultdict(set)
    for s in sets:
        for x in s:
            sets_containing[x].add(s)

    for s in sets:
        supersets = set.intersection(*(sets_containing[x] for x in s))
        if len(supersets) == 1:
            yield s

The difference is mainly in the final loop, which runs not through all n(n-1)/2 distinct pairs of sets, but only though the n sets in the outer loop, then through a set of candidate supersets that contain some element of the set under consideration. It can be optimized further by stopping the reduce early when it produces an empty set.

0人赞添加讨论(0) 举报

Yet another merging list of lists, but most python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间