This question already has an answer here:
- Why malloc+memset is slower than calloc? 3 answers
I'm benchmarking with “Perf” (Linux, gcc).
When allocating memory:
point_1 = calloc (100000000, 16); //this takes nearly 1 second and perf find 27M transfers from RAM->CACHE and 1M from CACHE->RAM
This is OK.
But when trying to allocate two arrays:
point_1 = calloc (100000000, 16);
point_2 = calloc (100000000, 16);
//again, program takes nearly 1 second, 27M transfers RAM-CACAHE, 1M CACHE->RAM
It looks like, that second “calloc” (and all following) are behave like “malloc”. I'm using “gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~12.04) ”. Otherwise program works fine.
Is that behaving OK?
Here are some more tests and results:
Time for allocating of data structure 1: 0.976468
Perf: R:27M, W:1M
Time for allocating of data structure 1: 0.975402
Time for initialization of data structure 1 to value of 7: 0.296787
Perf: R: 52M, W: 26M
Time for allocating of data structure 1: 0.976034
Time for initialization of data structure 1 to value of 7: 0.313554
Time for allocating of data structure 2: 0.000031 <-- misbehaving
Perf: R: 52M, W:26M
Time for allocating of data structure 1: 0.975403
Time for initialization of data structure 1 to value of 7: 0.313710
Time for allocating of data structure 2: 0.000031 <-- misbehaving
Time for initialization of data structure 2 to value of 7: 0.809855
Perf: R:79M, W: 53M
Each of the calls tries to allocate 1.6 GB of memory. I suspect the second call is failing, which would explain the symptoms. Check the return value from
calloc()
.It can confirm that the second calloc takes much shorter time. It seems that Linux decides to postpone some of the actual work.
On my system, the first calloc takes around 0.7 seconds. If I then iterate over the allocated memory area, setting it to something other than zero, this takes 0.2 seconds. In total, 0.9 seconds.
The second calloc then takes 0.0 seconds, but setting the second area takes 0.9 seconds. Same total time, but it seems that the second calloc, as Karoly Horvath wrote in a comment, doesn't actually create the memory pages, but leaves that to page faults when accessing the memory.
Another great comment by Karoly Horvath linked to this related question: Why malloc+memset is slower than calloc?
Tested on Ubuntu 14.04.1 LTS running on an Intel Core i7-4790K, with -O2 and a GCC that calls itself "gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2". Glibc version is Ubuntu EGLIBC 2.19-0ubuntu6.4.