I wrote a custom index to a custom table which uses 500MB of heap for 500k strings. Only 10% of the strings are unique; the rest are repeats. Every string is of length 4.
How i can optimize my code? Should I use another collection? I tried to implement a custom string pool to save memory:
public class StringPool {
private static WeakHashMap<String, String> map = new WeakHashMap<>();
public static String getString(String str) {
if (map.containsKey(str)) {
return map.get(str);
} else {
map.put(str, str);
return map.get(str);
}
}
}
private void buildIndex() {
if (monitorModel.getMessageIndex() == null) {
// the index, every columns create an index
ArrayList<HashMap<String, TreeSet<Integer>>> messageIndex = new ArrayList<>(filterableColumn.length);
for (int i = filterableColumn.length; i >= 0; i--) {
// key -> string, value -> treeset, the row wich contains the key
HashMap<String, TreeSet<Integer>> hash = new HashMap<>();
messageIndex.add(hash);
}
// create index for every column
for (int i = monitorModel.getParser().getMyMessages().getMessages().size() - 1; i >= 0; --i) {
TreeSet<Integer> tempList;
for (int j = 0; j < filterableColumn.length; j++) {
String value = StringPool.getString(getValueAt(i, j).toString());
if (!messageIndex.get(j).containsKey(value)) {
tempList = new TreeSet<>();
messageIndex.get(j).put(value, tempList);
} else {
tempList = messageIndex.get(j).get(value);
}
tempList.add(i);
}
}
monitorModel.setMessageIndex(messageIndex);
}
}
You might want to examine your memory heap in a profiler. My guess is that the memory consumption isn't primarily in the String storage, but in the many
TreeSet<Integer>
instances. If so, you could optimize considerably by using primitive arrays (int[]
,short[]
, orbyte[]
, depending on the actual size of the integer values you're storing). Or you could look into a primitive collection type, such as those provided by FastUtil or Trove.If you do find that the String storage is problematic, I'll assume that you want to scale your application beyond 500k Strings, or that especially tight memory constraints require you to deduplicate even short Strings.
As Dev said,
String.intern()
will deduplicate Strings for you. One caveat, however - in the Oracle and OpenJDK virtual machines,String.intern()
will store those Strings in the VM permanent-generation, such that they will not be garbage-collected in the future. That's appropriate (and helpful) if:If either of those conditions is false, you are probably correct to build a custom pool. But my recommendation is that you consider a simple
HashMap
in place of theWeakHashMap
you're currently using. You probably don't want these values to be garbage-collected while they're in your cache, andWeakHashMap
adds another level of indirection (and the associated object pointers), increasing memory consumption further.Update: I'm told that JDK 7 stores interned Strings (
String.intern()
) in the main heap, not in perm-gen, as earlier JDKs did. That makesString.intern()
less risky if you're using JDK 7.No need to come up with a custom pool. Just use
String.intern()
.