Suppose I need to store 1000 objects in Hashset, is it better that I have 1000 buckets containing each object( by generating unique value for hashcode for each object) or have 10 buckets roughly containing 100 objects?
1 advantage of having unique bucket is that I can save execution cycle on calling equals() method?
Why is it important to have set number of buckets and distribute the objects amoung them as evenly as possible?
What should be the ideal object to bucket ratio?
Roughly one bucket per element is better for the processor, too many buckets is bad for the memory. Java will start with a small amount of buckets and automatically increase the capacity of your HashSet once it starts filling up, so you don't really need to care unless your application has issues performance and you've identified a hashset as the cause.
If you several elements in each bucket, lookups start taking longer. If you have lots of empty buckets, you're using more memory than you need and iterating over the elements takes longer.
This seems like a premature optimization waiting to happen though - the default constructor is fine in most cases.
Object.hashCode()
are of typeint
, you can only have 2^32 different values that's why you create buckets and distribute objects among them.Edit: If you are using
2^32
buckets to store 2^32 object then defiantly get operations will give you constant complexity but when you are inserting one by one element to store2^32
objects then rehashing will perform than means if we are usingObject[]
as buckets then each time it exceeds the length ofarray
it will create new array with greater size and copy elements into this. this process will increase complexity. That's why we make use ofequals
andhashcode
in ratio and that is done byHashsets
itself by providing betterhashing algorithm
.A
HashSet
should be able to determine membership in O(1) time on average. From the documentation:The algorithm a
Hashset
uses to achieve this is to retrieve the hash code for the object and use this to find the correct bucket. Then it iterates over all the items in the bucket until it finds one that is equal. If the number of items in the bucket is greater than O(1) then lookup will take longer than O(1) time.In the worst case - if all items hash to the same bucket - it will take O(n) time to determine if an object is in the set.
There is a space-time tradeoff here. Increasing the number of buckets decreases the chance of collisions. However it also increases memory requirements. The hash set has two parameters
initialCapacity
andloadFactor
that allow you to adjust how many buckets theHashSet
should create. The default load factor is 0.75 and this is fine for most purposes, but if you have special requirements you can choose another value.More information about these parameters can be found in the documentation for
HashMap
: