Java hashcodes collide in one case and not the oth

2019-05-18 22:22发布

问题:

I tried to write a small program to demonstrate hash collisions in java when only equals is overridden and not hashcode() method. This was to prove the theory that two unequal objects can have the same hashcode. This was for an Interview question where the behaviour was asked.

I created 200,000 objects, stored them in an array and then compared them to see which are duplicates. (For this I am using a nested for loop iterating over an array of objects after the object creation phase.) For around 200,000 objects I get 9 collisions. First one being object at index 196 and 121949. I then go on to print these hash codes to show that both values are the same.

However I am getting some very surprising behavior. If I iterate over the nested for loop and print the first collision of hashcodes, I get same hashcode value

1867750575
1867750575 

for both object at index 196 and 121949.

But if I comment out the nested for loop for detecting all the collisions and directly print hashcode for elements at index 196 and 121949 I get

1829164700
366712642

Note, I am not commenting out the creation of these elements, just the part where I check for collisions.

Why is this happening, even if I dont iterate over them, shouldn't the hashcode be consistent ?

Addendum 1: Is there a source behind this, as far as I know, going by the Birthday principle, if I create 200,000 objects, I must get a collision, how is iterating over each hascode or not changing anything?

Addendum 2: I tried adding another array of 200000 size just to see if the colliding indexes change, they did not, so apparently making changes to the binary with the loop uncommitted does not make any changes. So the hypothesis that changing binary changes hashcode doesn't hold.

Here is my code

import java.util.HashMap;

public class EmployeeFactory {

    private static int counter = 0;
    public int id;
    public String empName;

    EmployeeFactory() {
        id = counter;
        empName = "employee_" + id;
        counter++;
    }

    @Override
    public boolean equals(Object o) {

        // If the object is compared with itself then return true
        if (o == this) {
            return true;
        }

        if (o == null || o.getClass() != this.getClass()) {
            return false;
        }

        EmployeeFactory emp = (EmployeeFactory) o;

        // Compare the data members and return accordingly
        return this.id == emp.id;
    }

    public static void main(String[] args) {

        int Obj_Count = 200000;

        EmployeeFactory objs[] = new EmployeeFactory[Obj_Count];
        for (int i = 0; i < Obj_Count; ++i) {
            EmployeeFactory temp = new EmployeeFactory();
            objs[i] = temp;
        }


//Please try code once un commenting the loop below and once while keeping it commented.
 /*   
        for (int i = 0; i < Obj_Count; ++i)
        {
            for (int j = i + 1; j < Obj_Count; ++j)
            {
                if (objs[i].hashCode() == objs[j].hashCode())
                {
                    System.out.println("Objects with IDs " + objs[i].id
                                     + " and " + objs[j].id + " collided.");
                    System.out.println("Object Is " + i + "and Obj ID is "+ objs[i].id + " Has Hashcode " + objs[i].hashCode());
                    System.out.println("Object Is " + j + "and Obj ID is "+ objs[j].id + " Has Hashcode " + objs[j].hashCode());
                    System.out.println("");
                }
            }
        }
        */

        HashMap<EmployeeFactory, EmployeeFactory> hm = new HashMap<EmployeeFactory, EmployeeFactory>();
        objs[121949].id = objs[196].id;
        hm.put(objs[196], objs[196]);
        hm.put(objs[121949], objs[121949]);
        System.out.println(hm.get(objs[121949]).empName);
        System.out.println(hm.get(objs[196]).empName);

        // checking the hashmap
        System.out.println(hm.get(objs[121949]).hashCode());
        System.out.println(hm.get(objs[196]).hashCode());

        // Checking the array
        System.out.println(objs[121949].hashCode());
        System.out.println(objs[196].hashCode());

    }

}

Comemented Output:

employee_121949
employee_196
1829164700
366712642
1829164700
366712642

Un-commented loop Output

Objects with IDs 196 and 121949 collided.
Object Is 196and Obj ID is 196 Has Hashcode 1867750575
Object Is 121949and Obj ID is 121949 Has Hashcode 1867750575

Objects with IDs 62082 and 145472 collided.
Object Is 62082and Obj ID is 62082 Has Hashcode 2038112324
Object Is 145472and Obj ID is 145472 Has Hashcode 2038112324

Objects with IDs 62354 and 105841 collided.
Object Is 62354and Obj ID is 62354 Has Hashcode 2134400190
Object Is 105841and Obj ID is 105841 Has Hashcode 2134400190

Objects with IDs 68579 and 186938 collided.
Object Is 68579and Obj ID is 68579 Has Hashcode 1872358815
Object Is 186938and Obj ID is 186938 Has Hashcode 1872358815

Objects with IDs 105219 and 111288 collided.
Object Is 105219and Obj ID is 105219 Has Hashcode 651156501
Object Is 111288and Obj ID is 111288 Has Hashcode 651156501

Objects with IDs 107634 and 152385 collided.
Object Is 107634and Obj ID is 107634 Has Hashcode 273791087
Object Is 152385and Obj ID is 152385 Has Hashcode 273791087

Objects with IDs 108007 and 146405 collided.
Object Is 108007and Obj ID is 108007 Has Hashcode 1164664992
Object Is 146405and Obj ID is 146405 Has Hashcode 1164664992

Objects with IDs 135275 and 180997 collided.
Object Is 135275and Obj ID is 135275 Has Hashcode 996371445
Object Is 180997and Obj ID is 180997 Has Hashcode 996371445

Objects with IDs 153749 and 184310 collided.
Object Is 153749and Obj ID is 153749 Has Hashcode 254720071
Object Is 184310and Obj ID is 184310 Has Hashcode 254720071

employee_121949
employee_121949
1867750575
1867750575
1867750575
1867750575

回答1:

Matt Timmermans' answer covers the basic issues fairly well, particularly "You can't expect to have any consistency...between different runs." (+1)

The default Object.hashCode() implementation (also called the identity hash code because it's the same as System.identityHashCode(obj)) is currently, in Hotspot, just a pseudo-random number with a thread-local seed. There hasn't been any dependency on the object's memory address for quite some time. If your program execution is completely deterministic, the hashes will most likely be repeatable.

Also note that the identity hashcode is lazily generated on the first call to Object.hashCode() or System.identityHashCode() and the value is stored in the object so that subsequent calls on this object will return the same value. If you run your collision detector loop in another thread, you'll get completely different hash values, and thus different collisions.



回答2:

When you don't override hashCode(), you get the identity hash code function inherited from class Object.

The identity hash code depends on stuff you can't see that can theoretically change every time you run your program, like where the object ends up in memory, the number of objects created before yours, etc. You just can't expect to have any consistency in identity hash values between different runs of your program or method.

However, if you run exactly the same program twice, and it's not too big, the odds are pretty good that you will end up with the same hashes both times. If you change the program, though, you change how much memory will be consumed by loading and compiling the class, which is very likely to change the identity hashes by changing the location in memory that your objects will go to.