equals and hashCode: Is Objects.hash method broken

2019-02-10 10:03发布

I am using Java 7, and I have the following class below. I implemented equals and hashCode correctly, but the problem is that equals returns false in the main method below yet hashCode returns the same hash code for both objects. Can I get more sets of eyes to look at this class to see if I'm doing anything wrong here?

UPDATE: I replaced the line on which I call the Objects.hash method with my own hash function: chamorro.hashCode() + english.hashCode() + notes.hashCode(). It returns a different hash code, which is what hashCode is supposed to do when two objects are different. Is the Objects.hash method broken?

Your help will be greatly appreciated!

import org.apache.commons.lang3.StringEscapeUtils;

public class ChamorroEntry {

  private String chamorro, english, notes;

  public ChamorroEntry(String chamorro, String english, String notes) {
    this.chamorro = StringEscapeUtils.unescapeHtml4(chamorro.trim());
    this.english = StringEscapeUtils.unescapeHtml4(english.trim());
    this.notes = notes.trim();
  }

  @Override
  public boolean equals(Object object) {
    if (!(object instanceof ChamorroEntry)) {
      return false;
    }
    if (this == object) {
      return true;
    }
    ChamorroEntry entry = (ChamorroEntry) object;
    return chamorro.equals(entry.chamorro) && english.equals(entry.english)
        && notes.equals(entry.notes);
  }

  @Override
  public int hashCode() {
    return java.util.Objects.hash(chamorro, english, notes);
  }

  public static void main(String... args) {
    ChamorroEntry entry1 = new ChamorroEntry("Åguigan", "Second island south of Saipan. Åguihan.", "");
    ChamorroEntry entry2 = new ChamorroEntry("Åguihan", "Second island south of Saipan. Åguigan.", "");
    System.err.println(entry1.equals(entry2)); // returns false
    System.err.println(entry1.hashCode() + "\n" + entry2.hashCode()); // returns same hash code!
  }
}

4条回答
萌系小妹纸
2楼-- · 2019-02-10 10:23

There is no requirement that unequal objects must have different hashCodes. Equal objects are expected to have equal hashCodes, but hash collisions are not forbidden. return 1; would be a perfectly legal implementation of hashCode, if not very useful.

There are only 32 bits worth of possible hash codes, and an unbounded number of possible objects, after all :) Collisions will happen sometimes.

查看更多
SAY GOODBYE
3楼-- · 2019-02-10 10:24

Actually, you happened to trigger pure coincidence. :)

Objects.hash happens to be implemented by successively adding the hash code of each given object and then multiplying the result by 31, while String.hashCode does the same with each of its characters. By coincidence, the differences in the "English" strings you used occur at exactly one offset more from the end of the string as the same difference in the "Chamorro" string, so everything cancels out perfectly. Congratulations!

Try with other strings, and you'll probably find that it works as expected. As others have already pointed out, this effect is not actually wrong, strictly speaking, since hash codes may correctly collide even if the objects they represent are unequal. If anything, it might be worthwhile trying to find a more efficient hash, but I hardly think it should be necessary in realistic situations.

查看更多
别忘想泡老子
4楼-- · 2019-02-10 10:27

HashCode being 32 bit int value, there is always a possibility of collisions(same hash code for two objects), but its rare/coincidental. Your example is one of the such a highly coincidental one. Here is the explanation.

When you call Objects.hash, it internally calls Arrays.hashCode() with logic as below:

public static int hashCode(Object a[]) {
    if (a == null)
        return 0;
    int result = 1;
    for (Object element : a)
        result = 31 * result + (element == null ? 0 : element.hashCode());
    return result;
}

For your 3 param hashCode, it results into below:

   31 * (31 * (31 *1 +hashOfString1)+hashOfString2) + hashOfString3

For your first object. Hash value of individual Strings are:

chamorro --> 1140493257 english --> 1698758127 notes --> 0

And for second object:

chamorro --> 1140494218 english --> 1698728336 notes -->0

If you notice, first two values of the hash code in both objects are different.

But when it computes the final hash code as:

  int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
  int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;

Coincidentally it results into same hash code 1919283673 because int is stored in 32 bits.

Verify the theory your self be using the code segment below:

  public static void main(String... args) {
    ChamorroEntry entry1 = new ChamorroEntry("Åguigan", 
                         "Second island south of Saipan. Åguihan.", "");
    ChamorroEntry entry2 = new ChamorroEntry("Åguihan", 
                         "Second island south of Saipan. Åguigan.", "");
    System.out.println(entry1.equals(entry2)); // returns false
    System.out.println("Åguigan".hashCode());
    System.out.println("Åguihan".hashCode());
    System.out.println("Second island south of Saipan. Åguihan.".hashCode());
    System.out.println("Second island south of Saipan. Åguigan.".hashCode());
    System.out.println("".hashCode());
    System.out.println("".hashCode());
    int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
    int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;
    System.out.println(entry1.hashCode() + "\n" + entry2.hashCode()); 
    System.out.println(getHashCode(
                    new String[]{entry1.chamorro, entry1.english, entry1.notes}) 
                    + "\n" + getHashCode(
                    new String[]{entry2.chamorro, entry2.english, entry2.notes})); 
    System.out.println(hashCode1 + "\n" + hashCode2); // returns same hash code!
  }

    public static int getHashCode(Object a[]) {
        if (a == null)
            return 0;
        int result = 1;
        for (Object element : a)
            result = 31 * result + (element == null ? 0 : element.hashCode());
        return result;
    }

If you use some different string parameters, hope it will result into different hashCode.

查看更多
Animai°情兽
5楼-- · 2019-02-10 10:30

it's not necessary for two unequal objects to have different hashes, the important thing is to have the same hash for two equal objects.

I can implement hashCode() like this :

public int hashCode() {
    return 5;
}

and it will stay correct (but inefficient).

查看更多
登录 后发表回答