How is this HashSet producing sorted output?

2020-01-29 15:24发布

问题:

The following code produces the out put [1,2]even though hashset is not sorted.

Set set = new HashSet();
set.add(new Integer(2));
set.add(new Integer(1));
System.out.println(set);

Why is that?

回答1:

EDIT: As of Java 8 and later, the following is no longer applicable. This proves that you shouldn't rely on undocumented Java behaviours.


This behaviour is caused by several separate reasons:

  • Integers hash to themselves
  • in Java, HashMaps and HashSets are backed up by an array
  • they also modify hashes using the higher bits to modify the lower bits; if the hash is in range 0..15, it's therefore not modified
  • what bucket an object goes depends on the lower bits of the modified hash
  • when iterating over a map or a set, the inner table is scanned sequentially

So if you add several small (<16) integers to a hashmap/hashset, this is what happens:

  • integer i has hashcode i
  • since it's less than 16, it's modified hash is also i
  • it lands in the bucket no. i
  • when iterating, the buckets are visited sequentially, so if all you stored there are small integers, they will be retrieved in ascending order

Note that if the initial number of buckets is too small, the integers may land in buckets not numbered after them:

HashSet<Integer> set = new HashSet<>(4);
set.add(5); set.add(3); set.add(1);
for(int i : set) {
  System.out.print(i);
}

prints 153.



回答2:

A HashSet as per the documentation does not guarantee any concept of order, so what you're seeing could very well change in a future update of Java.

However, if you're wondering why Java's (as of now) specific implementation of HashSet produces the result you're seeing: it's because the Integer of value 1 hashes to a location in the internal entry table of a HashMap that comes before the location to which 2 hashes (note that a HashSet is really backed by a HashMap with arbitrary values). This makes sense since the hash code of an Integer object is just its value.

In fact, you can see this even if you add even more numbers (within a certain range: the size of the entry table which is 16 by default):

Set<Integer> set = new HashSet<>();
set.add(2);
set.add(1);
set.add(4);
set.add(3);
set.add(0);
System.out.println(set);
[0, 1, 2, 3, 4]

Iteration over a HashSet takes place by iterating over the internal entry table, which means items earlier in the table come first.



回答3:

A HashSet is an unordered collection. It has no guarantees and no concepts of "ordering". See this answer for more details: What is the difference between Set and List?

You can consider using a TreeSet if you need ordered, sorted Set.

There is also a LinkedHashSet for an ordered Set that is not sorted.



回答4:

A set in java is not supposed to be ordered list. Use ArrayList instead. Also check Java Collection API for further reference.



标签: java hashset