At this web link:
http://www.7-cpu.com/cpu/IvyBridge.html
it says the latency for Ivy Bridge L1 cache access is:
- L1 Data Cache Latency = 4 cycles for simple access via pointer
- L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]).
Instead of "simple", did they mean if the pointer size is the same as the word size? So if the pointer is 32-bit and its a 32-bit OS then this would be "simple", otherwise it would cost the "complex" latency?
I just don't quite understand their explanation for the difference in the two latencies.
The full x86 effective address looks like displacement + base + index * scale
(where displacement
is a constant, base
and index
are registers, and scale
is 1, 2, 4 or 8).
It sounds like they call an address simple if only the displacement
is present (or maybe additionally the base
term), while having index * scale
would certainly fall under the complex category.
Update: Indeed, the intel optimization manual has this statement (for Sandy Bridge, though): The common load latency is five cycles. When using a simple addressing mode, base plus offset that is
smaller than 2048, the load latency can be four cycles. See also Table 2-12. Effect of Addressing Modes on Load Latency.