I have a list like below
L = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]]
The output should be
[[1,2],[1,3],[1,4],[2,5],[3,2]]
Please note that the order for the pairs and the elements has to be preserved. In other words, I cannot sort the list because the elements order needs to be preserved for the pairs as well. For example, I need the last element [3,2] to be preserved. If i sort them and remove duplicates this will be changed to [2,3] which I do not want. Also when I say I need to remove duplicates [1,2] or [2,1] is considered to be a duplicate and I want to preserve [1,2] from this.
The
unique_everseen
function in theitertools
recipes in the docs does exactly what you need:The basic idea is that it keeps a set of all values seen so far, and skips any value that's already in the set.
The
key
function works the same way as insort
,max
, etc., as explained in the Sorting HOWTO. You want to make two lists that have the same values in a different order match, so we need to compare the set of each list's values, instead of the lists themselves. (The reason we needfrozenset
instead ofset
is thatset
is mutable, and therefore can't be stored in a set.)If you had more than 2 elements in your sublists, the question would be ambiguous. If you have, say,
[1, 1, 2]
and[1, 2, 2]
, do you want them to be considered duplicates, or not?key=frozenset
.collections.Counter
, but there's noFrozenCounter
(and building one just for this purpose is probably overkill). You can simulate one in a few ways:key=lambda sublist: frozenset(Counter(sublist).items())
key=lambda sublist: sorted(Counter(sublist).items())
key=lambda sublist: tuple(sorted(sublist))
Since your initial thought was to sort the sublists—which was only unacceptable because you need to end up with the original value, not the sorted value—I think the last of these options is most likely to be the one you'd want, but that's still really just a guess.
You can just copy and paste the function from the docs into your code:
… or install the third-party library
more_itertools
and use itsunique_everseen
it from there. Or the different third-party librarytoolz
has an equivalent function namedunique
.This works similar to removing duplicates from a regular list whilst preserving order, but we need to do something about the sublists not being hashable and the order of the sublists being irrelevant.
We can use frozen sets to solve both these issues at once.