I read somewhere that java can allocate memory for objects in about 12 machine instructions. It's quite impressive for me. As far as I understand one of tricks JVM using is preallocating memory in chunks. This help to minimize number of requests to operating system, which is quite expensive, I guess. But even CAS operations can cost up to 150 cycles on modern processors.
So, could anyone explain real cost of memory allocation in java and which tricks JVM uses to speed up allocation?
The best trick is the generational garbage-collector. This keeps the heap unfragmented, so allocating memory is increasing the pointer to the free space and returning the old value. If memory runs out, the garbage-collection copy objects and creates this way a new unfragmented heap.
As different threads have to synchronize over the pointer to the free memory, if increasing it, they preallocate chunks. So a thread can allocate new memory, without the lock.
All of this is explained in more detail here: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
There is no single memory allocator for the JVM. IIRC correctly Sun's JVM and IBM's managed memory differently. However generally the way the JVM will operate is that it will initially allocate one piece of memory, this segment will be small enough to live in the processors cache making all access to this extremely fast.
As you inside the application create objects, the objects will take memory from within this segment. The object allocation within the segment is simply pointer arithmetic.
Initially the offset address into the freshly minted segment will be zero. The first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount meaning that memory allocation is blindingly fast, it's just pointer arithmetic.
Sun have a whitepaper here http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf and IBM used to have a bunch of stuff on ibm.com/developerworks
The JVM pre-allocates an area of memory for each thread (TLA or Thread Local Area). When a thread needs to allocate memory, it will use "Bump the pointer allocation" within that area. (If the "free pointer" points to adress 10, and the object to be allocated is size 50, then we just bump the free pointer to 60, and tell the thread that it can use the memory between 10 and 59 for the object).