What is size of a hash table with 32 bit key and 32 bit pointers to values stored separately?
Is it going to be 2^32 slots * (4 Bytes (key) + 4 Bytes (pointers to values)) = 4 * 10^9 * (4 + 4) = 32GB ?
I am trying to understand space complexity of hash tables.
I think you are asking the wrong question. The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of
O(1)
would mean that the datastructure alway consumes constant space no matter how many elements you put in there.O(n)
would mean that the space consumption grows linearly with the amount of elements in it.A hashtable typically has a space complexity of
O(n)
.So to answer your question: It depends on the number of elements it currently stores and in real world also on the actual implementation.
A lower bound for the memory consumption of your hashtable is: (Number of Values to Store) * (SizeOf a Value). So if you want to store 1 million values in the hashtable and each occupies 4 bytes then it will consume at least 4 million bytes (roughly 4MB). Usually real world implementations use a bit more memory for infrastructure but again: this highly depends on the actual implementation and there is no way to find out for sure but to measure it.
Still there is no perfect answer to the question. I am not sure about the space occupied. As per my understanding of the issue. The size is dynamic and varies with the size of input.
That is we start with a random number, hash table size, which is very less as compare to hash function value. Then we insert the input. Now, as the collision start occurring we dynamically double the hash table size. This is the reason, I think, for O(n) complexity. Kindly correct me if I am wrong.
Lets pretend we have a naive hashtable where the number of buckets is equal to double the size of the elements. That is O(2n) the number of elements which is O(n).
When the number of elements exceeds half of the number of available buckets, you need to create a new array of buckets, double the size and rehash all the elements to their new locations in the new array of buckets.
References:
HashMap.put
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.put%28java.lang.Object%2Cjava.lang.Object%29
Grepcode is down, you can take a look the openjdk repo here as a better reference: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java
Hash tables don't match hash function values and slots. The hash function is computed modulo the size of a reference vector that is much smaller than the hash function range. Because this value is fixed, it is not considered in the space complexity computation.
Consequently, the space complexity of every reasonable hash table is O(n).
In general, this works out quite well. While the key space may be large, the number of values to store is usually quite easily predictable. Certainly, the amount of memory that is functionally acceptable for data structure overhead is typically obvious.
This is why hash tables are so ubiquitous. They often provide the best data structure for a given task, mixing strictly bounded memory overhead with better than log2 n time complexity. I love binary trees but they don't usually beat hash tables.