Writing a Set implementation with a randomElement(

2019-09-09 00:32发布

问题:

I am trying to write an implementation of Set which has an additional method randomElement() which returns an element at random from the Set. I have based the implementation on HashSet for a fast contains() method. I have also used an ArrayList so that the randomElement() method is O(1) too - with an ArrayList all you have to do is choose a random index.

Here is my code.

public final class RandomChoiceSet<E> extends AbstractSet<E> {

    private final List<E> list = new ArrayList<>();
    private final Set<E> set = new HashSet<>();
    private final Random random = new Random();

    public RandomChoiceSet() {}

    public RandomChoiceSet(Collection<? extends E> collection) {
        addAll(collection);
    }

    public E randomElement() {
        return list.get(random.nextInt(list.size()));
    }

    @Override
    public int size() {
        return list.size();
    }

    @Override
    public boolean contains(Object o) {
        return set.contains(o);
    }

    @Override
    public void clear() {
        list.clear();
        set.clear();
    }

    @Override
    public boolean add(E e) {
        boolean result = set.add(e);
        if (result)
            list.add(e);
        return result;
    }

    @Override
    public boolean remove(Object o) {
        boolean result = set.remove(o);
        if (result)
            list.remove(o);
        return result;
    }

    @Override
    public Iterator<E> iterator() {
        return new Iterator<E>() {

            private final Iterator<E> iterator = list.iterator();
            private E e;

            @Override
            public boolean hasNext() {
                return iterator.hasNext();
            }

            @Override
            public E next() {
                return e = iterator.next();
            }

            @Override
            public void remove() {
                iterator.remove();
                set.remove(e);
            }
        };
    }
}

The drawback to maintaining a List as well as a Set is that it makes the remove() method O(n) because the element has to be found in the ArrayList first, and then all other elements have to be moved along one place. So I was wondering whether it is possible to write such a Set where all five operations size(), contains(), add(), remove() and randomElement() are O(1)?

回答1:

The only thing I can think of is to replace your set with a HashMap that maps from your element to it's position in the arrayList. size, contains, add and random will be the same. For remove you would do as follows:

Find the element from the HashMap
Retrieve it's position in the arrayList
Remove the element from the HashMap
Swap the deleted element with the last element in the array
Modify the swapped element position in the HashMap
Delete the last element from the array //Now this is O(1)

What makes this work is that you don't need any particular order in your array, you just need a random access storage so changing the order or the data will not cause any problems as long as you keep it in sync with your hashmap.



回答2:

I think you need a custom implementation of HashMap, because it requires a fine-grain control. Randomly picking a bucket is easy enough, and you could use ArrayLists in your buckets to have random access.

To make it clear, you would implement a classic HashMap but instead of using a LinkedList in each bucket you would have an ArrayList. Picking a random element would be as easy as :

  • randomly pick an index rd0 between 0 and nbBuckets
  • randomly picking an index rd1 between 0 and buckets[rd0].size()


回答3:

Here is a functioning implementation, following Amr's solution. Even the Iterator's remove() method works, because elements are always swapped in from a later position.

public final class RandomChoiceSet<E> extends AbstractSet<E> {

    private final List<E> list = new ArrayList<>();
    private final Map<E, Integer> map = new HashMap<>();
    private final Random random = new Random();

    public RandomChoiceSet() {}

    public RandomChoiceSet(Collection<? extends E> collection) {
        addAll(collection);
    }

    public E randomElement() {
        return list.get(random.nextInt(list.size()));
    }

    @Override
    public int size() {
        return list.size();
    }

    @Override
    public boolean contains(Object o) {
        return map.containsKey(o);
    }

    @Override
    public void clear() {
        list.clear();
        map.clear();
    }

    @Override
    public boolean add(E e) {
        if (map.containsKey(e))
            return false;
        map.put(e, list.size());
        list.add(e);
        return true;
    }

    @Override
    public boolean remove(Object o) {
        Integer currentIndex = map.get(o);
        if (currentIndex == null)
            return false;
        int size = list.size();
        E lastE = list.get(size - 1);
        list.set(currentIndex, lastE);
        list.remove(size - 1);
        map.put(lastE, currentIndex);
        map.remove(o);
        return true;
    }

    @Override
    public Iterator<E> iterator() {
        return new Iterator<E>() {

            private int index = 0;

            @Override
            public boolean hasNext() {
                return index < list.size();
            }

            @Override
            public E next() {
                return list.get(index++);
            }

            @Override
            public void remove() {
                RandomChoiceSet.this.remove(list.get(--index));
            }
        };
    }
}

EDIT

The more I think about this, the more useful this Set implementation seems. Although I have not done so in the code above, you can include a get() method to get elements by index. Not only is this nicer than the other ways for getting any element from a Set (e.g. set.iterator().next() or set.stream().findAny().get()), but it enables you to iterate over the set with an explicit index.

int n = set.size();
for (int i = 0; i < n; i++) {
    Object o = set.get(i);
    // do something with o.
}

I haven't done proper bench-marking, but iterating over a Set like this seems to be several times faster than iterating over a HashSet using a for each loop. Obviously there are no checks for concurrent modification, and this is a Set implementation that is more memory-hungry than a HashSet (although not as memory-hungry as a LinkedHashSet), but overall I think it's pretty interesting.



回答4:

This is a perfect example of why the Collections interface should have a getRandom() method and why it is a fundamental missing feature in the API. HashSet's implementation of getRandom() would just call it's private HashMap getRandom() method and so forth until the data representation is either indexable or iterable, point at which the getRandom() logic should be implemented.

Basically this getRandom() method would vary complexity according to the underlying implementation. But since under the hood all data eventually has to be stored either as arrays or linked lists there's a lot of optimization being cast aside by not having a Collection aware getRandom().

Think, what is a Collection? think can I retrieve a random element from a collection in the real world? yes, and so should it be in proper OO code.

However it doesn't, so you are stuck with iterating through the items in your hashSet and returning the random.nextInt(size()) element if you can't afford to build your own class.

If you can afford to build your own implementation which seems to be your case, then your suggestion is a fair approach however I don't get it why you implement your own anonymous Iterator, this should do just fine.

@Override public Iterator<E> iterator() { return set.iterator(); }