Java's memory model is based on "happens-before" relationship that enforces rules but also allows for optimization in the virtual machine's implementation in terms of cache invalidation.
For example in the following case:
// thread A
private void method() {
//code before lock
synchronized (lockA) {
//code inside
}
}
// thread B
private void method2() {
//code before lock
synchronized (lockA) {
//code inside
}
}
// thread B
private void method3() {
//code before lock
synchronized (lockB) {
//code inside
}
}
if thread A calls method()
and thread B tries to acquire lockA
inside method2()
, then the synchronization on lockA
will require that thread B observes all changes that thread A made to all of its variables prior to releasing its lock, even the variables that were changed in the "code before lock" section.
On the other hand, method3()
uses another lock and doesn't enforce a happens-before relatation. This creates opportunity for optimization.
My question is how does the virtual machine implements those complex semantics? Does it avoid a full flush of the cache when it is not needed?
How does it track which variables did change by which thread at what point, so that it only loads from memory just the cache-lines needed?
You expect a too high-level thinking of a JVM. The memory model is intentionally only describing what has to be guaranteed, instead of how it has to be implemented. Certain architectures have coherent caches that don’t need to be flushed at all. Still, there might be actions required when it comes to forbid reordering of reads and/or writes beyond a certain point.
But in all cases, these effects are global as the guarantees are made for all reads and writes, not depending on the particular construct which establishes the happens-before relationship. Recall, all writes happening before releasing a particular lock happen-before all reads after acquiring the same lock.
The JVM doesn’t process happens-before relationships at all. It processes code, either by interpreting (executing) it or by generating native code for it. When doing so, it has to obey the memory model by inserting barriers or flushes and by not reordering read or write instructions beyond these barriers. At this point, it usually considers the code in isolation, not looking at what other threads are doing. The effect of these flushes or barriers is always global.
However, having a global effect is not sufficient for establishing a happens-before relationship. This relationship only exists, when a thread is guaranteed to commit all writes before the other thread is guaranteed to (re-)read the values. This ordering does not exist, when two threads synchronize on different objects or acquire/release different locks.
In case of volatile
variables, you can evaluate the value of the variable to find out, whether the other thread has written the expected value and hence committed the writes. In case of a synchronized
block, the mutual exclusion enforces an ordering. So within the synchronized
block, a thread can examine all variables guarded by the monitor to evaluate the state, which should be the result of a previous update within a synchronized
block using the same monitor.
Since these effects are global, some developers were misguided into thinking that synchronizing on different locks was ok, as long as the assumption about a time ordering is “reasonable”, but such program code must be considered broken as it is relying on side effects of a particular implementation, especially its simplicity.
One thing that recent JVMs do, is to consider that objects which are purely local, i.e. never seen by any other thread, can’t establish a happens-before relationship when synchronizing on them. Therefore, the effects of synchronization can be elided in these cases. We can expect more optimizations in the future…
How does it track which variables did change by which thread at what point, so that it only loads from memory just the cache-lines needed?
No. That's not how modern CPUs work.
On every platform that you're likely to see multithreaded Java code running on that is complex enough to have this kind of issue, cache coherency is implemented in hardware. A cache line can be directly transferred from one cache to another without going through main memory. In fact, it would be awful if data had to pass through slow main memory every time it was put down on one core and picked up on another. So the caches communicate with each other directly.
When code modifies a memory address, the cache for that core acquires exclusive ownership of that memory address. If another core wants to read that memory address, the caches will typically share the memory address by direct communication. If either core wants to modify the shared data, it must invalidate the data in the other thread's cache.
So these caches are managed by hardware and effectively make themselves invisible at the software level.
However, CPUs do sometimes have prefetching or posted writes (not in cache yet). These simply require the use of memory barrier instructions. A memory barrier operates entirely inside the CPU to prevent reordering, delaying, or early execution of memory operations across the barrier. The CPU knows what memory operations are delayed or performed ahead of time, so code doesn't have to keep track of it.