I need to have an automatically sorted-by-values map in Java - so that It keeps being sorted at any time while I'm adding new key-value pairs or update the value of an existing key-value pair, or even delete some entry.
Please also have in mind that this map is going to be really big (100's of thousands, or even 10's of millions of entries in size).
So basically I'm looking for the following functionality:
Supposed that we had a class 'SortedByValuesMap' that implements the aforementioned functionality and we have the following code:
SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);
for (String key : sorted_map.keySet()) {
System.out.println(key + ":" + sorted_map.get(key));
}
the output should be:
bananas:6
apples:4
lemons:3
oranges:2
In particular, what's really important for me, is to be able to get the entry with the lowest value at any time - using a command like:
smallestItem = sorted_map.lastEntry();
which should give me the 'oranges' entry
EDIT: I am a Java newbie so please elaborate a bit in your answers - thanks
EDIT2: This might help: I am using this for counting words (for those who are familiar: n-grams in particular) in huge text files. So I need to build a map where keys are words and values are the frequencies of those words. However, due to limitations (like RAM), I want to keep only the X most frequent words - but you can't know beforehand which are going to be the most frequent words of course. So, the way I thought it might work (as an approximation) is to start counting words and when the map reaches a top-limit (like 1 mil entries) , the least frequent entry will be deleted so as to keep the map's size to 1 mil always.
I found the need of a similar structure to keep a list of objects ordered by associated values. Based on the suggestion from Mechanical snail in this thread, I coded up a basic implementation of such a map. Feel free to use.
This implementation does not honor all the contracts of the Map interface such as reflecting value changes and removals in the returned key set and entry sets in the actual map, but such a solution would be a bit large to include in a forum like this. Perhaps I will work on one and make it available via github or something similar.
Keep 2 data structures:
HashMap<String, Long>
.An "array" to keep track of order, such that
list[count]
holds aSet<String>
of words with that count.I'm writing this as though it were an array as a notational convenience. In fact, you probably don't know an upper bound on the number of occurrences, so you need a resizable data structure. Implement using a
Map<Long, Set<String>>
. Or, if that uses too much memory, use anArrayList<Set<String>>
(you'll have to test forcount == size() - 1
, and if so, useadd()
instead ofset(count + 1)
).To increment the number of occurrences for a word (pseudocode):
To iterate over words in order (pseudocode):
if all you need is the "min" value, then just use a normal map and keep track of the "min" value anytime it is modified.
EDIT:
so, if you really need value ordering and you want to use out-of-the-box solutions, you basically need 2 collections. One normal map (e.g. HashMap), and one SortedSet (e.g. TreeSet>). you can traverse ordered elements via the TreeSet, and find frequencies by key using the HashMap.
obviously, you could always code up something yourself sort of like a LinkedHashMap, where the elements are locatable by key and traversable by order, but that's pretty much going to be entirely custom code (i doubt anything that specific already exists, but i could be wrong).
Try the solution posted on http://paaloliver.wordpress.com/2006/01/24/sorting-maps-in-java/ . You have the flexibility of doing sorting ascending or descending too.
Here is what they say
Outputs
Update: You cannot sort maps by values, sorry.
You can useSortedMap
implementation likeTreeMap
withComparator
defining order by values (instead of default - by keys).Or, even better, you can put elements into a PriorityQueue with predefined comparator by values. It should be faster and take less memory compared to TreeMap.
You may refer to the implementation of
java.util.LinkedHashMap
. The basic idea is, using a inner linked list to store orders. Here is some details:Extends from HashMap. In HashMap, each entry has a key and value, that is basic. You can Add a next and a prev pointer to store entries in order by value. And a header and tail pointer to get the first and last entry. For every modification (add, remove, update), you can add your own code to change the list order. It is no more than a linear search and pointer switch.
Sure it will be slow for add/update if there are too many entries because it is a linked list not array. But as long as the list is sorted, I believe there are lots of ways to speedup the search.
So here is what you got: A map that has the same speed with HashMap when retrieving an entry by a key. A linked list which stores entries in order.
We can discuss this further if this solution meets your requirement.
to jtahlborn: As I said, it surely is slow without any optimization. Since we are talking about performance not impl now, lots of things can be done.
One solution is using a tree instead of Linked List, like Red-Black Tree. Then iterate the tree instead of iterator the map.
About the smallest value, it is easier. Just using a member variable to store the smallest, when add or update an element, update the smallest value. When delete, search the tree for the smallest (this is very fast)
if tree is too complex, it is also possible to using another list/array to mark the some positions in the list. for example, maybe 100 element each. Then when search, just search the position list first and then the real list. This list also needs to be maintained, it would be reasonable to recount the position list for certain times of modification, maybe 100.