Python set with the ability to pop a random elemen

2019-02-05 17:01发布

问题:

I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:

import random
random.sample(mySet, 1)

But this is quite slow for large sets (it runs in O(n) time).

Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):

for e in mySet:
    break
# e is now an element from mySet

I coded my own rudimentary class which has constant time lookup, deletion, and random values.

class randomSet:
    def __init__(self):
        self.dict = {}
        self.list = []

    def add(self, item):
        if item not in self.dict:
            self.dict[item] = len(self.list)
            self.list.append(item)

    def addIterable(self, item):
        for a in item:
            self.add(a)

    def delete(self, item):
        if item in self.dict:
            index = self.dict[item]
            if index == len(self.list)-1:
                del self.dict[self.list[index]]
                del self.list[index]
            else:
                self.list[index] = self.list.pop()
                self.dict[self.list[index]] = index
                del self.dict[item]

    def getRandom(self):
        if self.list:
            return self.list[random.randomint(0,len(self.list)-1)]

    def popRandom(self):
        if self.list:
            index = random.randint(0,len(self.list)-1)
            if index == len(self.list)-1:
                del self.dict[self.list[index]]
                return self.list.pop()
            returnValue = self.list[index]
            self.list[index] = self.list.pop()
            self.dict[self.list[index]] = index
            del self.dict[returnValue]
            return returnValue

Are there any better implementations for this, or any big improvements to be made to this code?

回答1:

I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)

discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:

def popRandom(self):
    item = self.getRandom()
    self.discard(item)
    return item

That way you don't have to maintain two separate item removal methods.

Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:

def discard(self, item):
    if item in self.dict:
        index = self.dict[item]
        self.list[index], self.list[-1] = self.list[-1], self.list[index]
        self.dict[self.list[index]] = index
        del self.list[-1]                    # or in one line:
        del self.dict[item]                  # del self.dict[self.list.pop()]


回答2:

One approach you could take is to derive a new class from set which salts itself with random objects of a type derived from int.

You can then use pop to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).

This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.



回答3:

Can't we implement a new class inheriting from set with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit from object, i.e. use class randomSet(object). Also PEP8 is something to consider for you :-)

Edit: For getting some ideas of what hackish solutions might be capable of, this thread is worth reading: http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html



回答4:

Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.

However, I'd inherit straight from "set" and just keep track of the added items in an internal list (as you did) - and leave the methods I don't use alone.

Maybe add a "sync" method to update the internal list whenever the set is updated by set-specific operations, like the *_update methods.

That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)



回答5:

If you don't mind only supporting comparable elements, then you could use blist.sortedset.



回答6:

Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.

from random import randint


class RandomSet(object):
  """
  Implements a set in which elements can be
  added and drawn uniformly and randomly in
  constant time.
  """

  def __init__(self, seq=None):
    self.dict = {}
    self.list = []
    if seq is not None:
      for x in seq:
        self.add(x)

  def add(self, x):
    if x not in self.dict:
      self.dict[x] = len(self.list)
      self.list.append(x)

  def pop(self, x=None):
    if x is None:
      i = randint(0,len(self.list)-1)
      x = self.list[i]
    else:
      i = self.dict[x]
    self.list[i] = self.list[-1]
    self.dict[self.list[-1]] = i
    self.list.pop()
    self.dict.pop(x)
    return x

  def __contains__(self, x):
    return x in self.dict

  def __iter__(self):
    return iter(self.list)

  def __repr__(self):
    return "{" + ", ".join(str(x) for x in self.list) + "}"

  def __len__(self):
    return len(self.list)