I am looking for verification on two different but related arguments-- those above (A) and below (B) the first line line-comment here in the Q.
(A) The way HashMap is structured is:
a HashMap is a plain table. thats direct memory access (DMA).
The whole idea behind HashMap (or hashing in general) at the first place is to put into use this constant-time memory access for
a.) accessing records by their own data content (< K,V >), not by their locations in DMA (the table index)
b.) managing variable number of records-- a number of records not of a given size, and may/not remain constant in size throughout the use of this structure.
So, the overall structure in a Java Hash is:
a table: table // i`m using the identifier used in HashMap
each cell of this table is a bucket.
Each bucket is a linked list of type Entry-- i.e., each node of this linked list (not the linked list of Java/API, but the data structure) is of type Entry which in turn is a < K,V > pair.
When a new pair comes in to be added to the hash, a unique hashCode is calculated for this < K,V > pair. This hashCode is the key to the index of this < K,V > in table-- it tells which bucket this < K,V > will go in in the hash. Note: hashCode is "normalized" thru the function hash() (in HashMap for one) to better-fit the current length of the table. indexFor() is also at use to determine which bucket, i.e., cell of table the < K,V > will go in.
When the bucket is determined, the < K,V > is added to the beginning of the linked list in this bucket-- as a result, it is the first < K,V > entry in this bucket and the first entry of the linked-list-that-already-existed is now the "next" entry that is pointed by this newly added one.
//===============================================================
(B) From what I see in HashMap, the resizing of the table-- the hash is only done upon a decision based on hash size and capacity, which are the current and max. # entries in the entire hash.
There is no re-structuring or resizing upon individual bucket sizes-- like "resize() when the max.#entries in a bucket exceeds such&such".
It is not probable, but is possible that a significant number of entries may be bulked up in a bucket while the rest of the hash is pretty much empty.
If this is the case, i.e., no upper limit on the size of each bucket, hash is not of constant but linear access-- theoretically for one thing. It takes $O(n)$ time to get hold of an entry in hash where $n$ is the total number of entries. But then it shouldn't be.
//===============================================================
I don't think I'm missing anything in Part (A) above.
I'm not entirely sure of Part (B). It is a significant issue and I'm looking to find out how accurate this argument is.
I'm looking for verification on both parts.
Thanks in advance.
//===============================================================
EDIT:
Maximum bucket size being fixed, i.e., hash being restructured whenever the #entries in a bucket hits a maximum would resolve it-- the access time is plain constant in theory and in use.
This isn't a well structured but quick fix, and would work just fine for sake of constant access.
The hashCodes are likely to be evenly distributed throughout the buckets and it isn`t so likely that anyone of the buckets will hit the bucket-max before the threshold on the overall size of the hash is hit. This is the assumption the current setup of HashMap is using as well.
Also based on Peter Lawrey`s discussion below.