Partial sort Collection with limit and custom Comp

2020-07-23 06:32发布

问题:

I want to sort an ArrayList called imageList like this:

Collections.sort(imageList, new MapComparator(Function.KEY_TIMESTAMP, "dsc"));

This works fine, but now I want to be able to set a limit (show only the newest 100 images, where the ArrayList is unsorted, so simply creating a sublist won't work) for performance reasons.

My MapComparator class looks like this:

class MapComparator implements Comparator<HashMap<String, String>>
{
    private final String key;
    private final String order;

    public MapComparator(String key, String order)
    {
        this.key = key;
        this.order = order;
    }

    public int compare(HashMap<String, String> first,
                       HashMap<String, String> second)
    {
        String firstValue = first.get(key);
        String secondValue = second.get(key);
        if(this.order.toLowerCase().contentEquals("asc"))
        {
            return firstValue.compareTo(secondValue);
        }else{
            return secondValue.compareTo(firstValue);
        }

    }
}

Does anyone know how to implement that? Thanks in advance!

回答1:

I'm not aware of an official name for this kind of problem, but it does occur reasonably frequently, and it's often called something like a top-k or greatest-k problem.

You certainly have to process all the elements in the input, because the last element might belong in the "top k" set and you don't know until you've processed every last element. However, you don't have to sort the entire input. Doing something like sorting and then taking a sublist, or with a stream, calling sorted() followed by limit(), can potentially be very expensive, since with N input elements, sorting is O(N log N). However, it's possible to reduce the time complexity to O(N) simply by keeping track of the greatest k elements seen so far as you run through the list.

Guava has a Collector that does exactly this: Comparators.greatest(k, comparator).

If you don't want to use Guava, it's not too difficult to build your own collector that's more-or-less equivalent. A PriorityQueue is quite a useful for this purpose. Here's a first cut at it:

static <T> Collector<T,PriorityQueue<T>,List<T>> topK(int k, Comparator<? super T> comp) {
    return Collector.of(
        () -> new PriorityQueue<>(k+1, comp),
        (pq, t) -> {
            pq.add(t);
            if (pq.size() > k)
                pq.poll();
        },
        (pq1, pq2) -> {
            pq1.addAll(pq2);
            while (pq1.size() > k)
                pq1.poll();
            return pq1;
        },
        pq -> {
            int n = pq.size();
            @SuppressWarnings("unchecked")
            T[] a = (T[])new Object[n];
            while (--n >= 0)
                a[n] = pq.poll();
            return Arrays.asList(a);
        },
        Collector.Characteristics.UNORDERED);
}

This uses a PriorityQueue as an intermediate data structure. As elements are added, the smallest element is trimmed off when the queue exceeds k in size. At the end, the elements are pulled from the queue and put into a list in reverse order, so the resulting list is sorted highest to lowest.

For example, given a List<Integer> containing

[920, 203, 880, 321, 181, 623, 496, 576, 854, 323,
 339, 100, 795, 165, 857, 935, 555, 648, 837, 975]

one can do

List<Integer> out = input.stream()
                         .collect(topK(5, Comparator.naturalOrder()));

resulting in

[979, 936, 890, 875, 831]

As an aside, it's possible to create a map comparator much more simply by using the combinator methods in the Comparator class. For example, suppose your input looks like this:

    List<Map<String, String>> input =
        List.of(Map.of("name", "map1", "timestamp", "00017"),
                Map.of("name", "map2", "timestamp", "00192"),
                Map.of("name", "map3", "timestamp", "00001"),
                Map.of("name", "map4", "timestamp", "00072"),
                Map.of("name", "map5", "timestamp", "04037"));

You can easily sort the maps by timestamp like this:

    input.stream()
         .sorted(Comparator.comparing(map -> map.get("timestamp")))
         .forEach(System.out::println);

Or collect them into a list, or sort-in-place using sort(comparator), or whatever. You can reverse the sort by doing:

    input.stream()
         .sorted(Comparator.comparing(map -> map.get("timestamp"), Comparator.reverseOrder()))
         .forEach(System.out::println);

The output of the latter will then be:

{name=map5, timestamp=04037}
{name=map2, timestamp=00192}
{name=map4, timestamp=00072}
{name=map1, timestamp=00017}
{name=map3, timestamp=00001}


回答2:

Use a sorted Stream:

List<HashMap<String, String>> newestImages = 
    imageList.stream()
             .sorted(new MapComparator(Function.KEY_TIMESTAMP, "dsc"))
             .limit(100)
             .collect(Collectors.toList());

However, this will require processing all the elements in your List. You can't avoid that if you want sorted output.