When my program performs a load operation with acquire semantics/store operation with release semantics or perhaps a full-fence, it invalidates the CPU's cache.
My question is this: which part of the cache is actually invalidated? only the cache-line that held the variable that I've used acquire/release? or perhaps the entire cache is invalidated? (L1 + L2 + L3 .. and so on?). Is there a difference in this subject when I use acquire/release semantics, or when i use a full-fence?
相关问题
- slurm: use a control node also for computing
- jsp caching tag library
- How can we cache HLS video url once streamed
- C# CPU and GPU Temp
- Hibernate cache level 1
相关文章
- Is there a google API to read cached content? [clo
- How to get CPU serial under Linux without root per
- Is it possible to run 16 bit code in an operating
- AWS API Gateway caching ignores query parameters
- Check if url is cached webview android
- WebView's LOAD_NO_CACHE setting still saves fi
- QML Loader not shows changes on .qml file
- UnitOfWork in Action Filter seems to be caching
I'm not an expert on this, but I stumbled on this document, maybe it's helpful http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2009.04.05a.pdf
When you perform a load without fences or mutexes, then the loaded value could potentially come from anywhere, i.e, caches, registers (by way of compiler optimizations), or RAM... but from your question, you already knew this.
In most mutex implementations, when you acquire a mutex, a fence is always applied, either explicitly (e.g., mfence, barrier, etc.) or implicitly (e.g., lock prefix to lock the bus on x86). This causes the cache-lines of all caches on the path to be invalidated.
Note that the entire cache isn't invalidated, just the respective cache-lines for the memory location. This also includes the lines for the mutex (which is usually implemented as a value in memory).
Of course, there are architecture-specific details, but this is how it works in general.
Also note that this isn't the only reason for invalidating caches, as there may be operations on one CPU that would need caches on another one to be invalidated. Doing a google search for "cache coherence protocols" will provide you with a lot of information on this subject.