Implementing a cache modeling framework

2019-01-27 00:55发布

问题:

I would like to model the behavior of caches in Intel architectures (LRU, inclusive, K-Way Associative, etc)., I've read wikipedia, Ulrich Drepper's great paper on memory, and the Intel Manual Volume 3A: System Programming Guide (chapter 11, but it's not very helpful, because they only explain what can be manipulated at the software level). I've also read a bunch of academic papers, but as usual, they do not make their code available for replication... even after asking for it. My question is, is there already a publicly available framework to model cache behavior? If not, is there a document detailing the behavior of caches from Intel at the deepest levels? I could not find one.

回答1:

There are plenty of cache simulators out there, Dinero for e.g. (pun obviously intended) should be fairly simple and is often used for educational purposes.
Note that this simulator is trace-driven, it means it feeds on a list of memory access addresses, it doesn't know how to run a binary. You can produce such traces by emulating them with binary instrumentation tools, for e.g.

  • pin
  • qemu
  • bochs

etc.. Note that some of these offer internal cache simulators already, and may be possible to play with.

Other simulators can simulate full CPU/system behavior, not just caches, and can therefore support running a binary. Most of them include within them a simulated cache system. For e.g.:

  • gem5
  • simplescalar
  • multi2sim
  • marss86

and many others

On the other hand, writing your own cache simulator is fairly simple - if you can work on a memory trace (writing an actual fronend is way more complicated). You won't be able to get a too detailed spec on actual caches in Intel/AMD products, but the basic functionality is detailed in any computer architecture textbook or even wikipedia, the parameters (size, associativity, coherency policies) are mostly documented in the published guides, and may often change between product generations. You can always ask here if you encounter any specific question :)

Edit:

Regarding the second part of the question - there's no publicly available documentation of the exact cache implementation of Intel CPUs, but the dry "specs" (size, associativity, policies) are in the optimization guide: Now, modeling these caches should be straightforward, but there may be some hidden caveats, like powerdown features or specialized LRU behaviors. One such reported example can be found here - http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ (if this is true, it might be worth implementing for accuracy), but aside from that I believe the overall behavior shouldn't be affected by these details too much, for any practical use.