The documnation for PREFETCHT2
, which is prefetch with T2 hint, says (emphasis mine):
T0 (temporal data)—prefetch data into all levels of the cache hierarchy.
T1 (temporal data with respect to first level cache misses)—prefetch data into level 2 cache and higher.
T2 (temporal data with respect to second level cache misses)—prefetch data into level 3 cache and higher, or an implementation-specific choice.
NTA (non-temporal data with respect to all cache levels)—prefetch data into non-temporal cache structure and into a location close to the processor, minimizing cache pollution.
Earlier version of the document had identical text for T1 and T2, indicating that they did the same thing.
So on modern Intel and AMD processors, does T2 actually fetch into the L3 (and not the L2)? Or does the "implementation-specific choice" come into play?