Java collection and memory optimization

2019-05-14 22:16发布

I wrote a custom index to a custom table which uses 500MB of heap for 500k strings. Only 10% of the strings are unique; the rest are repeats. Every string is of length 4.

How i can optimize my code? Should I use another collection? I tried to implement a custom string pool to save memory:

public class StringPool {

    private static WeakHashMap<String, String> map = new WeakHashMap<>();

    public static String getString(String str) { 
        if (map.containsKey(str)) {
            return map.get(str);
        } else {
            map.put(str, str);
            return map.get(str);
        }
    }
}

private void buildIndex() {
        if (monitorModel.getMessageIndex() == null) {
            // the index, every columns create an index
            ArrayList<HashMap<String, TreeSet<Integer>>> messageIndex = new ArrayList<>(filterableColumn.length);
            for (int i = filterableColumn.length; i >= 0; i--) {
                // key -> string,   value -> treeset, the row wich contains the key
                HashMap<String, TreeSet<Integer>> hash = new HashMap<>();
                messageIndex.add(hash);
            }
            // create index for every column
            for (int i = monitorModel.getParser().getMyMessages().getMessages().size() - 1; i >= 0; --i) {
                TreeSet<Integer> tempList;

                for (int j = 0; j < filterableColumn.length; j++) {
                    String value  = StringPool.getString(getValueAt(i, j).toString());
                    if (!messageIndex.get(j).containsKey(value)) {
                        tempList = new TreeSet<>();
                        messageIndex.get(j).put(value, tempList);
                    } else {
                        tempList = messageIndex.get(j).get(value);
                    }

                    tempList.add(i);
                }
            }
            monitorModel.setMessageIndex(messageIndex);
        }
    }

2条回答
Deceive 欺骗
2楼-- · 2019-05-14 23:10

You might want to examine your memory heap in a profiler. My guess is that the memory consumption isn't primarily in the String storage, but in the many TreeSet<Integer> instances. If so, you could optimize considerably by using primitive arrays (int[], short[], or byte[], depending on the actual size of the integer values you're storing). Or you could look into a primitive collection type, such as those provided by FastUtil or Trove.

If you do find that the String storage is problematic, I'll assume that you want to scale your application beyond 500k Strings, or that especially tight memory constraints require you to deduplicate even short Strings.

As Dev said, String.intern() will deduplicate Strings for you. One caveat, however - in the Oracle and OpenJDK virtual machines, String.intern() will store those Strings in the VM permanent-generation, such that they will not be garbage-collected in the future. That's appropriate (and helpful) if:

  1. The Strings you're storing do not change throughout the life of the VM (e.g., if you read in a static list at startup and use it throughout the life of your application).
  2. The Strings you need to store fit comfortably in the VM permanent generation (with adequate room for classloading and other consumers of PermGen). Update: see below.

If either of those conditions is false, you are probably correct to build a custom pool. But my recommendation is that you consider a simple HashMap in place of the WeakHashMap you're currently using. You probably don't want these values to be garbage-collected while they're in your cache, and WeakHashMap adds another level of indirection (and the associated object pointers), increasing memory consumption further.

Update: I'm told that JDK 7 stores interned Strings (String.intern()) in the main heap, not in perm-gen, as earlier JDKs did. That makes String.intern() less risky if you're using JDK 7.

查看更多
太酷不给撩
3楼-- · 2019-05-14 23:20

No need to come up with a custom pool. Just use String.intern().

查看更多
登录 后发表回答