Retaining order while using Python's set diffe

2019-01-25 07:09发布

问题:

I'm doing a set difference operation in Python:

from sets import Set
from mongokit import ObjectId
x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
y = [ObjectId("4f7acde943f1e51fb6000003")]
print list(Set(x).difference(Set(y)))

I'm getting:

[ObjectId('4f7abaa043f1e51544000007'), ObjectId('4f7ac02543f1e51a44000001'), ObjectId('4f7aba8a43f1e51544000006')]

I need to get the first element for next operation which is important. How can I retain the list x in original format?

回答1:

It looks like you need an ordered set instead of a regular set.

>>> x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
>>> y = [ObjectId("4f7acde943f1e51fb6000003")]
>>> print list(OrderedSet(x) - OrderedSet(y))
[ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]

Python doesn't come with an ordered set, but it is easy to make one:

import collections

class OrderedSet(collections.Set):

    def __init__(self, iterable=()):
        self.d = collections.OrderedDict.fromkeys(iterable)

    def __len__(self):
        return len(self.d)

    def __contains__(self, element):
        return element in self.d

    def __iter__(self):
        return iter(self.d)

Hope this helps :-)



回答2:

Sets are unordered, so you will need to put the results back in the correct order after doing your set difference. Fortunately you already have the elements in the order you want, so this is easy.

diff = set(x) - set(y)
result = [o for o in x if o in diff]

But this can be streamlined; you can do the difference as part of the list comprehension (though it is arguably slightly less clear that that's what you're doing).

sety = set(y)
result = [o for o in x if o not in sety]

You could even do it without creating the set from y, but the set will provide fast membership tests, which will save you significant time if either list is large.



回答3:

You could just do this

diff = set(x) - set(y)
[item for item in x if item in diff]

or

filter(diff.__contains__, x)


标签: python set order