I know the difference between Open Addressing and Chaining for resolving hash collisions . Most of the basic hash based data structures like HashSet
,HashMap
in Java primarily use chaining technique. I read that ThreadLocal actually uses a probing scheme . So I want to understand why is open addressing not so much used in Java ? I mean it would be difficult to delete records using that scheme , in the sense that you have to mark those cells with some special handling . However it seems like memory requirement will be low for open addressing scheme.
Edit : I just want to understand the possible major reason/reasons for this design decision . I do not want finer details . Also I would like to know why ThreadLocal uses the lesser common technique of open addressing . I guess the two answers can be related together . So I prefer to ask in the same question itself.
I am currently discussing memory-compact reimplementations of HashMap
and HashSet
with, among others, Doug Lea. This particular question hasn't come up, but here's my first thoughts on the question...
- Chained hash tables degrade reasonably gracefully. Whether it's higher load factors or lots of hash collisions, chaining doesn't degrade nearly as quickly as open addressing can.
- As you've said,
remove
is...not a pleasant operation on open-addressed tables. As a general rule, remove
is the least common operation on hash tables, but there are applications for which it's more common, and bad performance would be noticed.
- I also suspect -- though I don't have much data -- that implementing a "linked" open-addressed hash table would be noticeably more difficult.
LinkedHashMap
is written as a subclass of HashMap
, and borrows most of the implementation details; it's somewhat easier to implement the linked list of entries when the entries are discrete objects -- and at that point, you're already most of the way to a chained implementation.
- Nothing in the spec ties them to this implementation -- they're always free to mess around with it later.
- The JDK collections libraries...don't make memory consumption an especially high priority. Memory is cheap. (You may or may not agree with this, but it's definitely a noticeable trend.)