I am trying to write an implementation of Set
which has an additional method randomElement()
which returns an element at random from the Set
. I have based the implementation on HashSet
for a fast contains()
method. I have also used an ArrayList
so that the randomElement()
method is O(1)
too - with an ArrayList
all you have to do is choose a random index.
Here is my code.
public final class RandomChoiceSet<E> extends AbstractSet<E> {
private final List<E> list = new ArrayList<>();
private final Set<E> set = new HashSet<>();
private final Random random = new Random();
public RandomChoiceSet() {}
public RandomChoiceSet(Collection<? extends E> collection) {
addAll(collection);
}
public E randomElement() {
return list.get(random.nextInt(list.size()));
}
@Override
public int size() {
return list.size();
}
@Override
public boolean contains(Object o) {
return set.contains(o);
}
@Override
public void clear() {
list.clear();
set.clear();
}
@Override
public boolean add(E e) {
boolean result = set.add(e);
if (result)
list.add(e);
return result;
}
@Override
public boolean remove(Object o) {
boolean result = set.remove(o);
if (result)
list.remove(o);
return result;
}
@Override
public Iterator<E> iterator() {
return new Iterator<E>() {
private final Iterator<E> iterator = list.iterator();
private E e;
@Override
public boolean hasNext() {
return iterator.hasNext();
}
@Override
public E next() {
return e = iterator.next();
}
@Override
public void remove() {
iterator.remove();
set.remove(e);
}
};
}
}
The drawback to maintaining a List
as well as a Set
is that it makes the remove()
method O(n)
because the element has to be found in the ArrayList
first, and then all other elements have to be moved along one place. So I was wondering whether it is possible to write such a Set
where all five operations size()
, contains()
, add()
, remove()
and randomElement()
are O(1)
?
The only thing I can think of is to replace your set with a HashMap that maps from your element to it's position in the arrayList. size, contains, add and random will be the same. For remove you would do as follows:
Find the element from the HashMap
Retrieve it's position in the arrayList
Remove the element from the HashMap
Swap the deleted element with the last element in the array
Modify the swapped element position in the HashMap
Delete the last element from the array //Now this is O(1)
What makes this work is that you don't need any particular order in your array, you just need a random access storage so changing the order or the data will not cause any problems as long as you keep it in sync with your hashmap.
I think you need a custom implementation of HashMap
, because it requires a fine-grain control. Randomly picking a bucket is easy enough, and you could use ArrayList
s in your buckets to have random access.
To make it clear, you would implement a classic HashMap
but instead of using a LinkedList
in each bucket you would have an ArrayList
. Picking a random element would be as easy as :
- randomly pick an index
rd0
between 0 and nbBuckets
- randomly picking an index
rd1
between 0 and buckets[rd0].size()
Here is a functioning implementation, following Amr's solution. Even the Iterator
's remove()
method works, because elements are always swapped in from a later position.
public final class RandomChoiceSet<E> extends AbstractSet<E> {
private final List<E> list = new ArrayList<>();
private final Map<E, Integer> map = new HashMap<>();
private final Random random = new Random();
public RandomChoiceSet() {}
public RandomChoiceSet(Collection<? extends E> collection) {
addAll(collection);
}
public E randomElement() {
return list.get(random.nextInt(list.size()));
}
@Override
public int size() {
return list.size();
}
@Override
public boolean contains(Object o) {
return map.containsKey(o);
}
@Override
public void clear() {
list.clear();
map.clear();
}
@Override
public boolean add(E e) {
if (map.containsKey(e))
return false;
map.put(e, list.size());
list.add(e);
return true;
}
@Override
public boolean remove(Object o) {
Integer currentIndex = map.get(o);
if (currentIndex == null)
return false;
int size = list.size();
E lastE = list.get(size - 1);
list.set(currentIndex, lastE);
list.remove(size - 1);
map.put(lastE, currentIndex);
map.remove(o);
return true;
}
@Override
public Iterator<E> iterator() {
return new Iterator<E>() {
private int index = 0;
@Override
public boolean hasNext() {
return index < list.size();
}
@Override
public E next() {
return list.get(index++);
}
@Override
public void remove() {
RandomChoiceSet.this.remove(list.get(--index));
}
};
}
}
EDIT
The more I think about this, the more useful this Set
implementation seems. Although I have not done so in the code above, you can include a get()
method to get elements by index. Not only is this nicer than the other ways for getting any element from a Set
(e.g. set.iterator().next()
or set.stream().findAny().get()
), but it enables you to iterate over the set with an explicit index.
int n = set.size();
for (int i = 0; i < n; i++) {
Object o = set.get(i);
// do something with o.
}
I haven't done proper bench-marking, but iterating over a Set
like this seems to be several times faster than iterating over a HashSet
using a for each
loop. Obviously there are no checks for concurrent modification, and this is a Set
implementation that is more memory-hungry than a HashSet
(although not as memory-hungry as a LinkedHashSet
), but overall I think it's pretty interesting.
This is a perfect example of why the Collections interface should have a getRandom()
method and why it is a fundamental missing feature in the API. HashSet's implementation of getRandom()
would just call it's private HashMap getRandom()
method and so forth until the data representation is either indexable or iterable, point at which the getRandom()
logic should be implemented.
Basically this getRandom()
method would vary complexity according to the underlying implementation. But since under the hood all data eventually has to be stored either as arrays or linked lists there's a lot of optimization being cast aside by not having a Collection aware getRandom().
Think, what is a Collection? think can I retrieve a random element from a collection in the real world? yes, and so should it be in proper OO code.
However it doesn't, so you are stuck with iterating through the items in your hashSet and returning the random.nextInt(size())
element if you can't afford to build your own class.
If you can afford to build your own implementation which seems to be your case, then your suggestion is a fair approach however I don't get it why you implement your own anonymous Iterator, this should do just fine.
@Override
public Iterator<E> iterator() {
return set.iterator();
}