can anyone explain what is load buffer and how it's different from invalidation queues. and also difference between store buffers and write combining buffers? The paper by Paul E Mckenny http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf explains very nicely about the store buffers and invalidation queues but unfortunately doesn't talk about write combining buffers
相关问题
- How to make Motherboard Beep through C++ Code? [cl
- Shared common definitions across C/C++ (unmanaged)
- AMD CPU versus Intel CPU openCL
- OpenCL on Linux with integrated intel graphic chip
- Long latency instruction
相关文章
- Why does the latency of the sqrtsd instruction cha
- x86 Program Counter abstracted from microarchitect
- Why doesn't there exists a subi opcode for MIP
- DDD Architecture - Where To Put Common Methods/Hel
- Proper scenekit architecture for multi level/scree
- MySQLi Error Handling?
- How would you handle a special case in this digita
- GET requests in TOR network without installing TOR
An invalidate queue is more like a store buffer, but it's part of the memory system, not the CPU. Basically it is a queue that keeps track of invalidations and ensures that they complete properly so that a cache can take ownership of a cache line so it can then write that line. A load queue is a speculative structure that keeps track of in-flight loads in the out of order processor. For example, the following can occur
A store buffer is a speculative structure that exists in the CPU, just like the load queue and is for allowing the CPU to speculate on stores. A write combining buffer is part of the memory system and essentially takes a bunch of small writes (think 8 byte writes) and packs them into a single larger transaction (a 64-byte cache line) before sending them to the memory system. These writes are not speculative and are part of the coherence protocol. The goal is to save bus bandwidth. Typically, a write combining buffer is used for uncached writes to I/O devices (often for graphics cards). It's typical in I/O devices to do a bunch of programming of device registers by doing 8 byte writes and the write combining buffer allows those writes to be combined into larger transactions when shipping them out past the cache.