Getting the indices of an unsorted double array af

2019-09-02 19:24发布

问题:

This question comes as a companion of this one that regarded fastest sorting of a double array.

Now I want to get the top-k indices corresponding to the unsorted array.

I have implemented this version which (unfortunately) uses autoboxing and HashMap as proposed in some answers including this one:

HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
    map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
    topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}

As you can see this uses a HashMap with keys the Double values of the original array and as values the indices of the original array. So, after sorting the original array I just retrieve it from the map.

I also use HashSet as I am interested in deciding if an int is included in this set, using .contains() method. (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). If this does not make a difference point it out though.

I am not interested in the value per se, only the indices.

My question is whether there is a faster approach to go with it?

回答1:

This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code.

Instead create an object:

class Data {
    double value;
    int originalIndex;
}

Create an array of Data objects storing the original value and index.

Sort them using a custom comparator that looks at data.value and sorts descending.

Now the top X items in your array are the ones you want and you can just look at the value and originalIndex as you need them.



回答2:

As Tim points out linking a multiple collections is rather errorprone. I would suggest using a TreeMap as this would allow for a standalone solution.

Lets say you have double[] data, first copy it to a TreeMap:

final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
    dataWithIndex.put(data[i], i);
}

N.B. You can declare dataWithIndex as a NavigableMap to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK.

This will populate the Map in O(n lg n) time as each put is O(lg n) - this is the same complexity as sorting. In reality it will be probably be a little slower, but it will scale in the same way.

Now, say you need the first k elements, you need to first find the kth element - this is O(k):

final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
    kthKey = keyIter.next();
}

Now you just need to get the sub-map that has all the entries upto the kth entry:

final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);

If you only need to do this once, then with Java 8 you can do something like this:

List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
        mapToObj(i -> new SimpleEntry<>(data[i], i)).
        sorted(comparing(Entry::getKey)).
        limit(k).
        collect(toList());

i.e. take an IntStream for the indices of data and mapToObj to an Entry of the data[i] => i (using the AbsractMap.SimpleEntry implementation). Now sort that using Entry::getKey and limit the size of the Stream to k entries. Now simply collect the result to a List. This has the advantage of not clobbering duplicate entries in the data array.

It is almost exactly what Tim suggests in his answer, but using an existing JDK class.

This method is also O(n lg n). The catch is that if the TreeMap approach is reused then it's O(n lg n) to build the Map but only O(k) to reuse it. If you want to use the Java 8 solution with reuse then you can do:

List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
        mapToObj(i -> new SimpleEntry<>(data[i], i)).
        sorted(comparing(Entry::getKey)).
        collect(toList());

i.e. don't limit the size to k elements. Now, to get the first k elements you just need to do:

List<Entry<Double, Integer>> subList = sorted.subList(0, k);

The magic of this is that it's O(1).