Java, HashMaps and using Strings as the keys - doe

2019-06-26 09:16发布

问题:

If I have a HashMap that looks like this:

HashMap<String, MyObject>

where the String key is a field in MyObject, does this string value get stored twice?

So when I add entries:

_myMap.put(myObj.getName(), myObj);

Am I using double the String size in terms of memory? Or does Java do something clever behind the scenes?

Thanks

回答1:

Java uses the reference, so it is just a pointer to the string that it stores twice. So you don't have to worry if your string is huge, it will still be the same amount of memory that is used.



回答2:

Unless you're actually creating a new String value in getName(), you're not duplicating your memory usage.

Here are a few examples to clarify things:

 String s1 = "Some really long string!";
 String s2 = s1;
 assert s1.equals(s2);

Here, s1 == s2; they refer to the same String instance. Your memory usage is 2 reference variables (no big deal), 1 String instance, and 1 backing char[] (the part that takes up memory).


 String s1 = "Some really long string!";
 String s2 = new String(s1);
 assert s1.equals(s2);

Here, s1 != s2; they refer to different String instances. However, since strings are immutable, the constructor knows that they can share the same character array. Your memory usage is 2 reference variables, 2 String instances (still no big deal, because...), and 1 backing char[].


 String s1 = "Some really long string!";
 String s2 = new String(s1.toCharArray());
 assert s1.equals(s2);

Here, just like before, s1 != s2. A different constructor is used, this time, however, that takes a char[] instead. To ensure immutability, toCharArray() must return a defensive copy of its internal array (that way any changes to the returned array would not mutate the String value).

[toCharArray() returns] a newly allocated character array whose length is the length of this string and whose contents are initialized to contain the character sequence represented by this string.

To make matters worse, the constructor must also defensively copy the given array to its internal backing array, again to ensure immutability. This means that as many as 3 copies of the character array may live in the memory at the same time! 1 of those will be garbage-collected eventually, so your memory usage is 2 reference variables, 2 String instances, and 2 backing char[]! NOW your memory usage is doubled!


So going back to your question, as long as you're not creating a new String value in getName() (i.e. if you just simply return this.name;), then you're fine. If you are doing even a simple concatenation, however (e.g. return this.firstName + this.lastName;), then you will double your memory usage!

The following code illustrates my point:

public class StringTest {
    final String name;
    StringTest(String name) {
        this.name = name;
    }
    String getName() {
        return this.name;      // this one is fine!
    //  return this.name + ""; // this one causes OutOfMemoryError!
    }
    public static void main(String args[]) {
        int N = 10000000;
        String longString = new String(new char[N]);
        StringTest test = new StringTest(longString);
        String[] arr = new String[N];
        for (int i = 0; i < N; i++) {
            arr[i] = test.getName();
        }
    }
}

You should first verify that the above code runs (java -Xmx128m StringTest) without throwing any exception. Then, modify getName() to return this.name + ""; and run it again. This time you will get an OutOfMemoryError.



回答3:

String are immutable, but pass-by-reference still apply. So it won't take twice as much memory.