I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?
If you don't mind only supporting comparable elements, then you could use
blist.sortedset
.Can't we implement a new class inheriting from
set
with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit fromobject
, i.e. useclass randomSet(object)
. Also PEP8 is something to consider for you :-)Edit: For getting some ideas of what hackish solutions might be capable of, this thread is worth reading: http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html
I think the best way to do this would be to use the
MutableSet
abstract base class incollections
. Inherit fromMutableSet
, and then defineadd
,discard
,__len__,
__iter__
, and__contains__
; also rewrite__init__
to optionally accept a sequence, just like theset
constructor does.MutableSet
provides built-in definitions of all otherset
methods based on those methods. That way you get the fullset
interface cheaply. (And if you do this,addIterable
is defined for you, under the nameextend
.)discard
in the standardset
interface appears to be what you have calleddelete
here. So renamedelete
todiscard
. Also, instead of having a separatepopRandom
method, you could just definepopRandom
like so:That way you don't have to maintain two separate item removal methods.
Finally, in your item removal method (
delete
now,discard
according to the standard set interface), you don't need an if statement. Instead of testing whetherindex == len(self.list) - 1
, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whetherindex == len(self.list) - 1
or not:Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.
However, I'd inherit straight from "set" and just keep track of the added items in an internal list (as you did) - and leave the methods I don't use alone.
Maybe add a "sync" method to update the internal list whenever the set is updated by set-specific operations, like the *_update methods.
That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)
Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.
One approach you could take is to derive a new class from
set
which salts itself with random objects of a type derived fromint
.You can then use
pop
to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.