Caching hashes in Java collections?

2019-03-02 13:39发布

When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()?

On the other hand, when I implement a class that overrides hashCode(), should I assume that the collection (i.e. HashSet) caches the hash?

This question is only about performance vs. memory overhead. I know that the hash value of an object should not change.

Clarification: A mutable object would of course have to clear the cached value when it is changed, whereas the collection relies on objects not changing. But this is not relevant for my question.

5条回答
倾城 Initia
2楼-- · 2019-03-02 13:49

On the other hand, when I implement a class that overrides hashcode(), should I assume that the collection (i.e. HashSet) caches the hash?

No, you should not make any assumptions beyond the scope of the class you are writing.

Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.

查看更多
孤傲高冷的网名
3楼-- · 2019-03-02 13:51

I'd say in most cases you can rely on efficient implementations of hashCode(). AFAIK, that method is only invoked on lookup methods (like contains, get etc.) or methods that change the collection (add/put, remove etc.).

Thus, in most cases there shouldn't be any need to cache hashes yourself.

查看更多
劳资没心,怎么记你
4楼-- · 2019-03-02 13:53

Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.

You could store objects in a wrapper HashNode or something, but I would try implementing it first without caching (just like HashSet et al does) and see if you need the added performance and complexity before going there.

查看更多
相关推荐>>
5楼-- · 2019-03-02 14:06

Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.

So as first approach, I would not cache hashes in my collection. In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.

If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).

查看更多
我欲成王,谁敢阻挡
6楼-- · 2019-03-02 14:11

When designing Guava's ImmutableSet and ImmutableMap classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!

It's true that HashMap does this caching, but it was HashMap's author (Josh Bloch) who strongly suggested we not follow that precedent!

Edit: oh, also, if your hashCode() is slow, the caching by the collection only addresses half of the problem anyway, as hashCode() still must be invoked on the object passed in to get() no matter what.

查看更多
登录 后发表回答