Can anyone explain why this example is thread safe without volatile?
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
In fact, assuming that the computeHashCode function always returned the same result and had no side effects (i.e., idempotent), you could even get rid of all of the synchronization.
// Lazy initialization 32-bit primitives
// Thread-safe if computeHashCode is idempotent
class Foo {
private int cachedHashCode = 0;
public int hashCode() {
int h = cachedHashCode;
if (h == 0) {
h = computeHashCode();
cachedHashCode = h;
}
return h;
}
// other functions and members...
}
MORE: I get it, we don't care if the value is computed twice (so it is not truly thread safe). I also like to know if new threads created after the hashcode has been calculated is guaranteed to see the new hashcode?
This is walking on a thin ice, but here is the explanation. Visibility problem means that some threads might see old version and some - new one. In our case, some threads see 0
while others - cachedHashCode
.
Threads that call hashCode()
and see cachedHashCode
will just return it (if (h == 0)
condition is not met) and everything works.
But threads that see 0
(despite the cachedHashCode
might have already been computed) will just recompute it again.
In other words, in the worst case scenario, every thread will enter the branch seeing 0
for the first time (like if it was ThreadLocal
).
Since computeHashCode()
is idempotent (very important), both calling it several times (by different threads) and reassign it again to the same variable shouldn't have any side effects.
The important piece of information here is
computeHashCode function always returned the same result
If this is true then the computeHashCode is known to be effectively immutable and since it will always be the same value you would never have a concurrency issue.
As far as making cachedHashCode volatile. It wouldnt make a difference as far as thread-safety goes because you are always assigning and returning a thread-local variable h
which will be a non-zero computedHashCode.
This is called the racy single-check idiom. It's used when computing a value is idempotent (returns the same value every time; corollary: the type must be immutable) and cheap (if it's accidentally recomputed more than once that's okay). This always takes some form along the lines of
class Foo {
private Value cacheField; // field holding the cached value; not volatile!
public Value getValue() {
Value value = cacheField; // very important!
if (value == 0 or null or whatever) {
value = computeValue();
cacheField = value;
}
return value;
}
}
or something more or less equivalent. If your implementation isn't idempotent, or isn't cheap, then you should use another idiom; see Effective Java item 71 for details. But the point is, threads do at most one read to cacheField
, and if they see cacheField
in a state where the value hasn't been computed, they recompute the value.
As mentioned in Effective Java, this is how String.hashCode()
is implemented, for example.
That would only be true if Foo is immutable with respect to fields that contribute to the hashcode. (which is necessary to satisfy "assuming that the computeHashCode function always returned the same result")
Otherwise I don't agree that it would be threadsafe.
- Thread 1 enters Hashcode(), gets half way through it, and pauses just before returning an answer of 5 (for example).
- Thread 2 Enters computeHashCode() and pauses half way through it
- Thread 3 modifies the object in a way that effects the hashcode.
- Thread 1 restarts, sets hashcode to 5 and puts the object in a hashmap as a key.
- Thread 2 restarts, sets the hashcode to 78392.
Thread 1 may or may not be able to find the key in the hash map later, depending on how the jvm chose to handle that cachedHashCode. The jvm has the option to keep separate copies of a non-volatile field if it likes. Volatile only ensures that the jvm doesn't do that and that all threads see the same value at all times.