I've been trying to write a pintool to instrument cache hits and misses in a given program.
I found that there are calls such as INS_IsMemoryRead/Write to determine if the instruction is a LD/ST.
- Is there a way to determine if the instruction had a cache hit or miss?
- If so, is it also possible to get the number of cycles spent fetching the data from the cache/memory?
It is not possible to do neither of these.
The cache tool, "Memory", which comes with pin is a very simple functional simulator of caches. Said in other words, by using the cache tool, one can see/simulate how many cache misses the application may have depending on the cache organization such as size, number of ways, cache levels. With some simple code writing it would be possible to report the instructions where the cache misses happen and later to map these instructions back to the source code. However, hit/miss results from the cache simulations may not be same as or correspond to the real computer systems even when the cache simulator is configured to have the same cache organization as the real system.
Also, one more limitation of the cache tool is that it is single threaded. You cannot use it for multi-threaded applications.
In addition, it will be impossible to get any timing information such as the number of cycles it takes to service a cache miss. This is very architecture dependent and I am not aware of a tool that can provide this information from the real system. Instead people use CPU timing simulators. Example CPU timing simulators are Gem5 http://www.gem5.org/ and Marss based on PtlSim http://marss86.org/.