I have an unsorted Collection of objects [that are comparable], is it possible to get a sub list of the collection of the list without having to call sort?
I was looking at the possibility of doing a SortedList with a limited capacity, but that didn't look like the right option.
I could easily write this, but I was wondering if there was another way.
I am not able to modify the existing collection's structure.
Since you don't want to call sort()
, it seems like you are trying to avoid an O(n log(n)) runtime cost. There is actually a way to do that in O(n) time -- you can use a selection algorithm.
There are methods to do this in the Guava libraries (Google's core Java libraries); look in Ordering
and check out:
public <E extends T> List<E> Ordering.leastOf(Iterable iterable, int k)
public <E extends T> List<E> Ordering.greatestOf(Iterable iterable, int k)
These are implementations of quickselect, and since they're written generically, you could just call them on your Set
and get a list of the k
smallest things. If you don't want to use the entire Guava libraries, the docs link to the source code, and I think it should be straightforward to port the methods to your project.
If you don't want to deviate too far from the standard libraries, you can always use a sorted set like TreeSet
, though this gets you logarithmic insert/remove time instead of the nice O(1) performance of the hash-based Set
, and it ends up being O(n log(n)) in the end. Others have mentioned using heaps. This will also get you O(n log(n)) running time, unless you use some of the fancier heap variants. There's a fibonacci heap implementation in GraphMaker if you're looking for one of those.
Which of these makes sense really depends on your project, but I think that covers most of the options.
I would probably create a sorted set. Insert the first N items from your unsorted collection into your sorted set. Then for the remainder of your unsorted collection:
- insert each item in the sorted set
- delete the largest item from the sorted set
- Repeat until you've processed all items in the unsorted collection
Yes, you can put all of them into a max heap data structure with a fixed size of N, conditionally, if the item is smaller than the largest in the max heap (by checking with the get()
"peek" method). Once you have done so they will, by definition, be the N smallest. Optimal implementations will perform with O(M)+lg(N)
or O(M)
(where M is the size of the set) performance, which is theoretically fastest. Here's some pseudocode:
MaxHeap maxHeap = new MaxHeap(N);
for (Item x : mySetOfItems) {
if (x < maxHeap.get()) {
maxHeap.add(x);
}
}
The Apache Commons Collections class PriorityBuffer seems to be their flagship binary heap data structure, try using that one.
http://en.wikipedia.org/wiki/Heap_%28data_structure%29
don't you just want to make a heap?