A Set in java never allows duplicates, but it take

2020-03-19 02:46发布

问题:

public static void main(String[] args) {
    HashSet set = new HashSet(); 
    set.add(new StringBuffer("abc"));
    set.add(new StringBuffer("abc")); 
    set.add(new StringBuffer("abc"));
    set.add(new StringBuffer("abc")); 
    System.out.println(set); 
}

Output:

[abc,abc,abc,abc]

Here in above code I added object of StringBuffer("abc") many times and Set adds it but Set never adds duplicates.

回答1:

StringBuffer does not override Object#equals() and Object#hashCode(), so identity of StringBuffer instances is based not on the contents of the buffer, but by the object's address in memory.*


* That identity is based on an address in memory is not strictly required by the JLS, but is a consequence of a typical Object#hashCode() implementation. From the JavaDoc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)



回答2:

StringBuffer doesn't override either equals or hashCode - so each object is only equal to itself.

This makes sense as StringBuffer is very much "mutable by design" - and equality can cause problems when two mutable objects are equal to each other, as one can then change. Using mutable objects as keys in a map or part of a set can cause problems. If you mutate one after insertion into the collection, that invalidates the entry in the collection as the hash code is likely to change. For example, in a map you wouldn't even be able to look up the value with the same object as the key, as the first test is by hash code.

StringBuffer (and StringBuilder) are designed to be very transient objects - create them, append to them, convert them to strings, then you're done. Any time you find yourself adding them to collections, you need to take a step back and see whether it really makes sense. Just occasionally it might do, but usually only when the collection itself is shortlived.

You should consider this in your own code when overriding equals and hashCode - it's very rarely a good idea for equality to be based on any mutable aspect of an object; it makes the class harder to use correctly, and can easily lead to subtle bugs which can take a long time to debug.



回答3:

Did it occur to you to see the equals() method (or the lack of it) in the StringBuffer? There lies the answer for you.

A Set or for that matter any hash based collection depends on the contract exposed by the equals() and hashcode() method on the Object for their behavior characteristic.

In your case since StringBuffer doesn't override these methods each StringBuffer instance that you create is different i.e new StringBuffer("abc") == new StringBuffer("abc") will return false.

I am curious as to why would someone add StringBuffer to a set.



回答4:

Most mutable object don't assume that if they happen to contain the same data they are the same. As they are mutable you can change the contents any time. i.e. it might be the same now, but not later, or it might be different now, but be the same later

BTW You shouldn't use StringBuffer if StringBuilder is an option. StringBuffer was replaced more than ten years ago.



回答5:

Two StringBuffer objects are different objects despite having the same arguments. Therefore HashSet just adds the StringBuffers instead of ignoring duplicates.



回答6:

A hash set works with "buckets". It stores values in those "buckets" according to their hash code. A "bucket" can have several members in it, depending on whether those members are equal, using the equals(Object) method.

So let's say we construct a hash set with 10 buckets, for argument's sake, and add the integers 1, 2, 3, 5, 7, 11 and 13 to it. The hash code for an int is just the int. We end up with something like this:

  • (empty)
  • 1, 11
  • 2
  • 3, 13
  • (empty)
  • 5
  • (empty)
  • 7
  • (empty)
  • (empty)

The traditional way to use a set is to look and see if a member is in that set. So when we say, "Is 11 in this set?" the hash set will modulo 11 by 10, get 1, and look in the 2nd bucket (we're starting our buckets with 0 of course).

This makes it really, really fast to see if members belong to a set or not. If we add another 11, the hash set looks to see if it's already there. It won't add it again if it is. It uses the equals(Object) method to determine that, and of course, 11 is equal to 11.

The hash code for a string like "abc" depends on the characters in that string. When you add a duplicate string, "abc", the hash set will look in the right bucket, and then use the equals(Object) method to see if the member is already there. The equals(Object) method for string also depends on the characters, so "abc" equals "abc".

When you use a StringBuffer, though, each StringBuffer has a hash code, and equality, based on its Object ID. It doesn't override the basic equals(Object) and hashCode() methods, so every StringBuffer looks to the hash set like a different object. They're not actually duplicates.

When you print the StringBuffers to the output, you're calling the toString() method on the StringBuffers. That makes them look like duplicate strings, which is why you're seeing that output.

This is also why it's very important to override hashCode() if you override equals(Object), otherwise the Set looks in the wrong bucket and you get some very odd and unpredictable behavior!