Delete Tuples in more dimensional list if same

2019-07-07 17:58发布

问题:

I have a list of tuples namely:

[[[('p', 'u'), ('r', 'w')], [('t', 'q')]], [[('p', 'u'), ('r', 'w')], [('v', 'q')]], [[('p', 'u'), ('r', 'w')], [('t', 's')]], [[('p', 'u'), ('r', 'w')], [('v', 's')]], [[('p', 'w'), ('r', 'u')], [('t', 'q')]], [[('p', 'w'), ('r', 'u')], [('v', 'q')]], [[('p', 'w'), ('r', 'u')], [('t', 's')]], [[('p', 'w'), ('r', 'u')], [('v', 's')]], [[('r', 'u'), ('p', 'w')], [('t', 'q')]], [[('r', 'u'), ('p', 'w')], [('v', 'q')]], [[('r', 'u'), ('p', 'w')], [('t', 's')]], [[('r', 'u'), ('p', 'w')], [('v', 's')]], **[[('r', 'w'), ('p', 'u')], [('t', 'q')]]**, [[('r', 'w'), ('p', 'u')], [('v', 'q')]], [[('r', 'w'), ('p', 'u')], [('t', 's')]], [[('r', 'w'), ('p', 'u')], [('v', 's')]]]

But now for example the element [[('p','u'),('r','w')], [('t','q')]]

is the same as [[('r','w'),('p','u')], [('t','q')]], which are marked fat in the list.

So in the list I have 16 elements, where every element is double.

Now, I want to delete the duplicates, that I have only the first eight elements left.

So naively, I've tried with

[[list(y) for y in set([tuple(set(x)) for x in doublegammas1])]]

But here, he says:

TypeError: unhashable type: 'list'

So my question:

How can I extend the list comprehension, that it works for a more dimensional list?

回答1:

A mutable object (such as a list or a set) cannot be a member of a set. You can use a frozenset, which is immutable.

main_list = [[[('p', 'u'), ('r', 'w')], [('t', 'q')]],
             [[('p', 'u'), ('r', 'w')], [('v', 'q')]],
             [[('p', 'u'), ('r', 'w')], [('t', 's')]],
             [[('p', 'u'), ('r', 'w')], [('v', 's')]],
             [[('p', 'w'), ('r', 'u')], [('t', 'q')]],
             [[('p', 'w'), ('r', 'u')], [('v', 'q')]],
             [[('p', 'w'), ('r', 'u')], [('t', 's')]],
             [[('p', 'w'), ('r', 'u')], [('v', 's')]],
             [[('r', 'u'), ('p', 'w')], [('t', 'q')]],
             [[('r', 'u'), ('p', 'w')], [('v', 'q')]],
             [[('r', 'u'), ('p', 'w')], [('t', 's')]],
             [[('r', 'u'), ('p', 'w')], [('v', 's')]],
             [[('r', 'w'), ('p', 'u')], [('t', 'q')]],
             [[('r', 'w'), ('p', 'u')], [('v', 'q')]],
             [[('r', 'w'), ('p', 'u')], [('t', 's')]],
             [[('r', 'w'), ('p', 'u')], [('v', 's')]]]

main_set = set(tuple(frozenset(innermost_list) for innermost_list in sublist) for sublist in main_list)

from pprint import pprint
pprint(main_set)

Output:

{(frozenset({('r', 'u'), ('p', 'w')}), frozenset({('t', 'q')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('v', 'q')})),
 (frozenset({('r', 'u'), ('p', 'w')}), frozenset({('v', 'q')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('t', 's')})),
 (frozenset({('r', 'u'), ('p', 'w')}), frozenset({('t', 's')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('v', 's')})),
 (frozenset({('r', 'u'), ('p', 'w')}), frozenset({('v', 's')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('t', 'q')}))}

To convert back to the original structure of nested lists:

new_list = [[list(frozen) for frozen in subtuple] for subtuple in main_set]
pprint(new_list)

Output:

[[[('r', 'u'), ('p', 'w')], [('t', 'q')]],
 [[('p', 'u'), ('r', 'w')], [('v', 'q')]],
 [[('r', 'u'), ('p', 'w')], [('v', 'q')]],
 [[('p', 'u'), ('r', 'w')], [('t', 's')]],
 [[('r', 'u'), ('p', 'w')], [('t', 's')]],
 [[('p', 'u'), ('r', 'w')], [('v', 's')]],
 [[('r', 'u'), ('p', 'w')], [('v', 's')]],
 [[('p', 'u'), ('r', 'w')], [('t', 'q')]]]

UPDATE:

A solution that removes the duplicate items in-place from the input data.

unique = []

for item in main_list[:]:
    frozen_item = frozenset(frozenset(innermost_list) for innermost_list in item)
    if frozen_item not in unique:
        unique.append(frozen_item)
    else:
        main_list.remove(item)


回答2:

Lists aren't hashable, tuples are hashable. You then need to take a set of these tuples. But inside these tuples, you want to disregard order. But a tuples of sets are not hashable, so instead you need to use tuples of frozenset objects:

uniques = {tuple(map(frozenset, i)) for i in doublegammas1}

print(uniques)

{(frozenset({('p', 'w'), ('r', 'u')}), frozenset({('t', 'q')})),
 (frozenset({('p', 'w'), ('r', 'u')}), frozenset({('v', 'q')})),
 (frozenset({('p', 'w'), ('r', 'u')}), frozenset({('v', 's')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('t', 's')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('t', 'q')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('v', 'q')})),
 (frozenset({('p', 'u'), ('r', 'w')}), frozenset({('v', 's')})),
 (frozenset({('p', 'w'), ('r', 'u')}), frozenset({('t', 's')}))}

You can then apply this via the itertools unique_everseen recipe, also available in 3rd party libraries as toolz.unique or more_itertools.unique_everseen:

from more_itertools import unique_everseen

def uniquekey(x):
    return tuple(map(frozenset, x))

res = list(unique_everseen(doublegammas1, key=uniquekey))

print(res)

[[[('p', 'u'), ('r', 'w')], [('t', 'q')]],
 [[('p', 'u'), ('r', 'w')], [('v', 'q')]],
 [[('p', 'u'), ('r', 'w')], [('t', 's')]],
 [[('p', 'u'), ('r', 'w')], [('v', 's')]],
 [[('p', 'w'), ('r', 'u')], [('t', 'q')]],
 [[('p', 'w'), ('r', 'u')], [('v', 'q')]],
 [[('p', 'w'), ('r', 'u')], [('t', 's')]],
 [[('p', 'w'), ('r', 'u')], [('v', 's')]]]

Input data

# input data
doublegammas1 = [[[('p', 'u'), ('r', 'w')], [('t', 'q')]],
                 [[('p', 'u'), ('r', 'w')], [('v', 'q')]],
                 [[('p', 'u'), ('r', 'w')], [('t', 's')]],
                 [[('p', 'u'), ('r', 'w')], [('v', 's')]],
                 [[('p', 'w'), ('r', 'u')], [('t', 'q')]],
                 [[('p', 'w'), ('r', 'u')], [('v', 'q')]],
                 [[('p', 'w'), ('r', 'u')], [('t', 's')]],
                 [[('p', 'w'), ('r', 'u')], [('v', 's')]],
                 [[('r', 'u'), ('p', 'w')], [('t', 'q')]],
                 [[('r', 'u'), ('p', 'w')], [('v', 'q')]],
                 [[('r', 'u'), ('p', 'w')], [('t', 's')]],
                 [[('r', 'u'), ('p', 'w')], [('v', 's')]],
                 [[('r', 'w'), ('p', 'u')], [('t', 'q')]],
                 [[('r', 'w'), ('p', 'u')], [('v', 'q')]],
                 [[('r', 'w'), ('p', 'u')], [('t', 's')]],
                 [[('r', 'w'), ('p', 'u')], [('v', 's')]]]