I just realized that implementing the following algorithm to compute the hash code for a stream is not possible using Stream.reduce(...). The problem is that the initial seed for the hash code is 1
which is not an identity for the accumulator.
The algorithm for List.hashCode() :
int hashCode = 1;
for (E e : list)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
You might be tempted to think that the following is correct but it isn't, although it will work if the stream processing is not split up.
List<Object> list = Arrays.asList(1,null, new Object(),4,5,6);
int hashCode = list.stream().map(Objects::hashCode).reduce(1, (a, b) -> 31 * a + b);
It seems that the only sensible way of doing it would be to get the Iterator
of the Stream
and do normal sequential processing or collect it to a List
first.
As a first approach I would use the collect-to-a-list solution as long as you don't have performance concerns. That way you avoid reimplementing the wheel and if one day the hash algorithm changes you benefit from that and you are also safe if the stream is parallelized (even if I'm not sure that's a real concern).
The way I would implement it can vary depending on how and when you need to compare your different datastructures (let's call it
Foo
).If you do it manually and sparsly a simple static function may be enough:
And use it like this
However, if instances of
Foo
are themselves stored inCollection
and you need bothhashCode()
andequals()
(fromObject
) to be implemented, I would wrap it inside aFooEqualable
:I'm fully aware that this solution isn't optimized (performance-wise) if multiple calls to
hashCode()
andequals()
are made but I tend not to optimize except if it becomes a concern.Holger wrote the right solution, if you want a simple way of doing it there are two additional possibilities:
1. collect to
List
and callhashCode()
2. use
Stream.iterator()
Just as a reminder the algorithm that
List.hashCode()
uses:While, at the first glance, the hash code algorithm seems to be non-parallelizable due to its non-associativity, it is possible, if we transform the function:
to
which basically is
or for an arbitrary
List
of sizen
:with the first
1
being the initial value of the original algorithm andeₓ
being the hash code of the list element at indexx
. While the summands are evaluation order independent now, there’s obviously a dependency to the element’s position, which we can solve by streaming over the indices in the first place, which works for random access lists and arrays, or solve generally, with a collector which tracks the number of encountered objects. The collector can resort to the repeated multiplications for the accumulation and has to resort to the power function only for combining results: