On modern x86 CPUs, hardware prefetching is an important technique to bring cache lines into various levels of the cache hierarchy before they are explicitly requested by the user code.
The basic idea is that when the processor detects a series of accesses to sequential or strided-sequential1 locations, it will go ahead and fetch further memory locations in the sequence, even before executing the instructions that (may) actually access those locations.
My question is if the detection of a prefetch sequence is based on the full addresses (the actual addresses requested by user code) or the cache line addresses which is pretty much the address excluding the bottom 6 bits2 stripped off.
For example, on a system with a 64-bit cache line, accesses to full addresses 1, 2, 3, 65, 150
would access cache lines 0, 0, 0, 1, 2
.
The difference could be relevant when a series of accesses is more regular in the cache line addressing than the full addressing. For example, a series of full addresses like:
32, 24, 8, 0, 64 + 32, 64 + 24, 64 + 8, 64 + 0, ..., N*64 + 32, N*64 + 24, N*64 + 8, N*64 + 0
might not look like a strided sequence at the full address level (indeed it might incorrectly trigger the backwards prefetcher since each subsequence of 4 accesses looks like an 8-byte strided reverse sequence), but at the cache line level it looks like its going forwards a cache line a time (just like the simple sequence 0, 8, 16, 24, ...
).
Which system, if either, is in place on modern hardware?
Note: One could imagine also that the answer wouldn't be based on every access, but only accesses which miss in the some level of the cache that the prefetcher is observing, but then the same question still applies to the filtered stream of "miss accesses".
1Strided-sequential just means that accesses that have the same stride (delta) between them, even if that delta isn't 1. For example, a series of accesses to locations 100, 200, 300, ...
could be detected as strided access with a stride of 100, and in principle the CPU will fetch based on this pattern (which would mean that some cache lines might be "skipped" in the prefetch pattern).
2 Here assuming a 64-bit cache line.