I want to measure the main memory bandwidth and while looking for the methodology, I found that,
- many used '
bcopy
' function to copy bytes from a source to destination and then measure the time which they report as the bandwidth. - Others ways of doing it is to allocate and array and walk through the array (with some stride) - this basically gives the time to read the entire array.
I tried doing (1) for data size of 1GB and the bandwidth I got is '700MB/sec' (I used rdtsc
to count the number of cycles elapsed for the copy). But I suspect that this is not correct because my RAM config is as follows:
- Speed: 1333 MHz
- Bus width: 32bit
As per wikipedia, the theoretical bandwidth is calculated as follows:
clock speed * bus width * # bits per clock cycle per line (2 for ddr 3 ram) 1333 MHz * 32 * 2 ~= 8GB/sec.
So mine is completely different from the estimated bandwidth. Any idea of what am I doing wrong?
=========
Other question is, bcopy involves both read and write. So does it mean that I should divide the calculated bandwidth by two to get only the read or only the write bandwidth? I would like to confirm whether the bandwidth is just the inverse of latency? Please suggest any other ways of measuring the bandwidth.
I can't comment on the effectiveness of bcopy, but the most straightforward approach is the second method you stated (with a stride of 1). Additionally, you are confusing bits with bytes in your memory bandwidth equation. 32 bits = 4bytes. Modern computers use 64 bit wide memory buses. So your effective transfer rate (assuming DDR3 tech)
1333Mhz * 64bit/(8bits/byte) = 10666MB/s (also classified as PC3-10666)
The 1333Mhz already has the 2 transfer/clock factored in.
Check out the wiki page for more info: http://en.wikipedia.org/wiki/DDR3_SDRAM
Regarding your results, try again with the array access. Malloc 1GB and traverse the entire thing. You can sum each element of the array and print it out so your compiler doesn't think it's dead code.
Something like this: