I was asked in an interview to calculate the memory usage for HashMap
and how much estimated memory it will consume if you have 2 million items in it.
For example:
Map <String,List<String>> mp=new HashMap <String,List<String>>();
The mapping is like this. One key as string an an array of strings as a key.
key value
----- ---------------------------
abc ['hello','how']
abz ['hello','how','are','you']
How would I estimate the memory usage of this HashMap Object in Java?
The short answer
To find out how large an object is, I would use a profiler. In YourKit, for example, you can search for the object and then get it to calculate its deep size. This will give a you a fair idea of how much memory would be used if the object were stand alone and is a conservative size for the object.
The quibbles
If parts of the object are re-used in other structures e.g. String literals, you won't free this much memory by discarding it. In fact discarding one reference to the HashMap might not free any memory at all.
What about Serialisation?
Serialising the object is one approach to getting an estimate, but it can be wildly off as the serialisation overhead and encoding is different in memory and to a byte stream. How much memory is used depends on the JVM (and whether its using 32/64-bit references), but the Serialisation format is always the same.
e.g.
In Sun/Oracle's JVM, an Integer can take 16 bytes for the header, 4 bytes for the number and 4 bytes padding (the objects are 8-byte aligned in memory), total 24 bytes. However if you serialise one Integer, it takes 81 bytes, serialise two integers and they takes 91 bytes. i.e. the size of the first Integer is inflated and the second Integer is less than what is used in memory.
String is a much more complex example. In the Sun/Oracle JVM, it contains 3 int
values and a char[]
reference. So you might assume it uses 16 byte header plus 3 * 4 bytes for the int
s, 4 bytes for the char[]
, 16 bytes for the overhead of the char[]
and then two bytes per char, aligned to 8-byte boundary...
What flags can change the size?
If you have 64-bit references, the char[]
reference is 8 bytes long resulting in 4 bytes of padding. If you have a 64-bit JVM, you can use +XX:+UseCompressedOops
to use 32-bit references. (So look at the JVM bit size alone doesn't tell you the size of its references)
If you have -XX:+UseCompressedStrings
, the JVM will use a byte[] instead of a char array when it can. This can slow down your application slightly but could improve you memory consumption dramatically. When a byte[] in used, the memory consumed is 1 byte per char. ;) Note: for a 4-char String, as in the example, the size used is the same due to the 8-byte boundary.
What do you mean by "size"?
As has been pointed out, HashMap and List is more complex as many, if not all, the Strings can be reused, possibly String literals. What you mean by "size" depends on how it is used. i.e. How much memory would the structure use alone? How much would be freed if the structure were discarded? How much memory would be used if you copied the structure? These questions can have different answers.
What can you do without a profiler?
If you can determine that the likely conservative size, is small enough, the exact size doesn't matter. The conservative case is likely to where you construct every String and entry from scratch. (I only say likely as a HashMap can have capacity for 1 billion entries even though it is empty. Strings with a single char can be a sub-string of a String with 2 billion characters)
You can perform a System.gc(), take the free memory, create the objects, perform another System.gc() and see how much the free memory has reduced. You may need to create the object many times and take an average. Repeat this exercise many times, but it can give you a fair idea.
(BTW While System.gc() is only a hint, the Sun/Oracle JVM will perform a Full GC every time by default)
I think that the question should be clarified because there is a difference between the size of the HashMap and the size of HashMap + the objects contained by the HashMap.
If you consider the size of the HashMap, in the example you provided, the HashMap stores one reference to the String "aby" and one reference to the List. So the multiple elements in the list do not matter. Only the reference to the list is stored in the value.
In a 32 bits JVM, in one Map entry, you have 4 bytes for the "aby" reference + 4 bytes for the List reference + 4 bytes for the "hashcode" int property of Map entry + 4 bytes for the "next" property of Map entry.
You also add the 4*(X-1) bytes references where the "X" is the number of empty buckets that the HashMap has created when you called the constructor new HashMap<String,List<String>>()
. According to http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html, it should be 16.
There are also loadFactor, modCount, threshold and size which are all primitive int type (16 more bytes) and header (8bytes).
So in the end, the size of your above HashMap would be 4 + 4 + 1 + (4*15) + 16 + 8 = 93 bytes
This is an approximation based on data that are owned by the HashMap. I think that maybe the interviewer was interested in seeing if you were aware of the way HashMap works (the fact for example that the default constructor create and array of 16 buckets for Map entry, the fact that the sizes of the objects stored in the HashMap do not affect the HashMap size since it only store the references).
HashMap are so widely used that under certain circumstances, it should be worth using the constructors with initial capacity and load factor.
you can't know in advance without knowing what all the strings are, and how many items are in each list, or without knowing if the strings are all unique references.
The only way to know for sure, is to serialize the whole thing to a byte array (or temp file) and see exactly how many bytes that was.