public class SomeClass {
private HashSet<SomeObject> contents = new HashSet<SomeObject>();
private Set<SomeObject> contents2 = new HashSet<SomeObject>();
}
What's the difference? In the end they are both a HashSet
isn't it? The second one looks just wrong to me, but I have seen it frequently used, accepted and working.
Set
is an interface, and HashSet
is a class that implements the Set
interface.
Declaring the variable as type HashSet
means that no other implementation of Set
may be used. You may want this if you need specific functionality of HashSet
.
If you do not need any specific functionality from HashSet
, it is better to declare the variable as type Set
. This leaves the exact implementation open to change later. You may find that for the data you are using, a different implementation works better. By using the interface, you can make this change later if needed.
You can see more details here: When should I use an interface in java?
Set is a collection interface that HashSet implements.
The second option is usually the ideal choice as it's more generic.
Since the HashSet class implements the Set interface, its legal to assign a HashSet to a Set variable. You could not go the other way however (assign a Set to a more specific HashSet variable).
Set
is an interface that HashSet
implements, so if you do this:
Set<E> mySet = new HashSet<E>();
You will still have access to the functionality of HashSet
, but you also have the flexibility to replace the concrete instance with an instance of another Set
class in the future, such as LinkedHashSet
or TreeSet
, or another implementation.
The first method uses a concrete class, allowing you to replace the class with an instance of itself or a subclass, but with less flexibility. For example, TreeSet
could not be used if your variable type was HashSet
.
This is Item 52 from Joshua Bloch's Effective Java, 2nd Edition.
Refer to Objects by their interfaces
... You should favor the use of interfaces rather than classes to refer to objects. If appropriate interface types exist, then parameters, return values, variables, and fields should all be declared using interface types. The only time you really need to refer to an object's class is when you're creating it with a constructor...
// Usually Good - uses interface as type
List<T> tlist = new Vector<T>();
// Typically Bad - uses concrete class as type!
Vector<T> vec = new Vector<T>();
This practice does carry some caveats - if the implementation you want has special behavior not guaranteed by the generic interface, then you have to document your requirements accordingly.
For example, Vector<T>
is synchronized
, whereas ArrayList<T>
(also an implementer of List<T>
) does not, so if you required synchronized containers in your design (or not), you would need to document that.
One thing worth to mention, is that interface vs. concrete class rule is most important for types exposed in API, eg. method parameter or return type. For private fields and variables it only ensures you aren't using any methods from concrete implementation (i.e. HashSet), but then it's private, so doesn't really matter.
Another thing is that adding another type reference will slightly increase size of your compiled class. Most people won't care, but these things adds up.