How does lmbench measure L1 and L2 cache latencies

2020-07-22 16:30发布

问题:

I am trying to understand how lmbench measures latency for L1, L2 and main memory.

The man page for lat_mem_rd mentions the method, but it's not clear to me:

The benchmark runs as two nested loops. The outer loop is the stride size. The inner loop is the array size. For each array size, the benchmark creates a ring of pointers that point forward one stride. Traversing the array is done by

p = (char **)*p;

in a for loop (the over head of the for loop is not significant; the loop is an unrolled loop 1000 loads long). The loop stops after doing a million loads.

How do you "create a ring of pointers that point forward one stride" ? Wouldn't this mean that if the stride size was 128 Bytes, you would need to make a linked list with each node separated by exactly 128 Bytes from it's previous one? malloc just returns some random free piece of memory, so I don't see how that's possible in C. And in the piece of code, I would always get a segmentation fault. (tested it, and what is p supposed to be initialized with?)

There is a similar thread on SO(link) and the first answer discusses this, but it does not talk about how strided approach can be used with linked lists. I also looked at the source code itself (lat_mem_rd.c) but couldn't understand this from that either.

Any help is appreciated.

回答1:

You can allocate large chunk of memory and then arrange elements of the linked list within allocated block on any boundary you want.