How does lmbench measure L1 and L2 cache latencies

2020-07-22 16:30发布

问题:

I am trying to understand how lmbench measures latency for L1, L2 and main memory.

The man page for lat_mem_rd mentions the method, but it's not clear to me:

The benchmark runs as two nested loops. The outer loop is the stride size. The inner loop is the array size. For each array size, the benchmark creates a ring of pointers that point forward one stride. Traversing the array is done by

p = (char **)*p;

in a for loop (the over head of the for loop is not significant; the loop is an unrolled loop 1000 loads long). The loop stops after doing a million loads.

How do you "create a ring of pointers that point forward one stride" ? Wouldn't this mean that if the stride size was 128 Bytes, you would need to make a linked list with each node separated by exactly 128 Bytes from it's previous one? malloc just returns some random free piece of memory, so I don't see how that's possible in C. And in the piece of code, I would always get a segmentation fault. (tested it, and what is p supposed to be initialized with?)

There is a similar thread on SO(link) and the first answer discusses this, but it does not talk about how strided approach can be used with linked lists. I also looked at the source code itself (lat_mem_rd.c) but couldn't understand this from that either.

Any help is appreciated.

You can allocate large chunk of memory and then arrange elements of the linked list within allocated block on any boundary you want.