I'm trying to figure out how to calculate the miss rate of an array. I have the answer, but I'm not understanding how the answer was arrived at. I have the following code:
int C[N1][N2];
int A[N1][N3];
int B[N3][N2];
initialize_arrays(A, B, C, N1, N2, N3);
for(i=0; i<N1; ++i)
for(j=0; j<N2; ++j)
for(k=0; k<N3, ++k)
C[i][j] += A[i][k] * B[k][j];
I also have the following info: N1=N2=N3=2048
(what does this mean??). The processor has an L1 data cache of 32kB with line size of 64B (no L2 cache). (what is line size?)
I know the miss rate of array C is (N^2/16)/N^3
. I know the formula is (total misses)/(total accesses)
. I see that there are N^3
total accesses, but how did they get the total misses?
I also know the miss rate of array B: (N^3)/(N^3)
and A: (N^2/16)/N^3
. Could someone please explain to me how they got the total misses here too?
Cache is always accessed / managed with the granularity of a line. Here the line size is 64B. That means that in one cache line there will be 16 elements in one cache line (64B / 4B = 16). Next thing that is important here is every 17th element is in a new line, and once a line is brought to cache, it can possibly give 15 hits.
Having got that out of the way, let us look at the access pattern. A[0][0], B[0][0], C[0][0], A[0][1], B[1][0], C[0][0], A[0][2], B[2][0], C[0][0], ...... A[0][16], B[16][0], C[0][0], A[0][17], B[17][0], C[0][0], ...... and so on
Now Since a line has 16 elements and it is safe to assume a row-major placement of elements in the memory A[0][0] - A[0][15] will belong to one the first cache line (let us call this AL1) and similarly let B[0][0] - B[0][15] belong to line BL1. From this information we can write the access pattern in terms of cache lines. AL1, BL1, CL1, AL1, BL129, CL1, AL1, BL257, CL1, ..... AL1, BL2049, CL1, AL2, BL2176, CL1,.... and so on.
Now the cache size is 32kB that means the cache can hold 512 lines. Since this is L1, let us assume it is a two-way associative cache. Therefore there are 256 sets. This means that after every 256 lines of an array, the 257th line maps to the same set as the 1st line.
The cache contents upon access to lines mapping to set 1 looks like this
1st Iteration - inner
2nd Iteration - inner
3rd Iteration - inner
4th Iteration - Inner
5th Iteration - Inner
. . .
16th Iteration - Inner
17th Iteration - Inner
18th Iteration - Inner
For the 1st iteration of the middle loop. We have for set 1,
for 2nd - 16th iteration of the middle loop,
for 17th - 2048th iteration of the middle loop,
To summarize - for 1 iteration of the outer loop, we have for set 1,
In the outer loop, we can see that every alternate iteration will have the same hit / miss / access pattern as the 1st iteration (summarized above) for arrays C and A. Every iteration of the outer loop will have the same hit / miss / access pattern a the 1st iteration for array B.
Hence, (Finally!)
For Set 1, at the end of the program, we have
The access pattern will be similar across all sets. Therefore by the end of the program we have, across all sets
I know this is different from what is indicated in the question. I could not figure out how. I will be glad to hear other responses.