-->

Vulkan's VkAllocationCallbacks implemented wit

2019-05-11 09:28发布

问题:

I'm reading Vulkan Memory Allocation - Memory Host and seems that VkAllocationCallbacks can be implemented using naive malloc/realloc/free functions.

typedef struct VkAllocationCallbacks {
   void*                                   pUserData;
   PFN_vkAllocationFunction                pfnAllocation;
   PFN_vkReallocationFunction              pfnReallocation;
   PFN_vkFreeFunction                      pfnFree;
   PFN_vkInternalAllocationNotification    pfnInternalAllocation;
   PFN_vkInternalFreeNotification          pfnInternalFree;
} VkAllocationCallbacks;

But I see only two possible reasons to implement my own vkAllocationCallback:

  • Log and track memory usage by Vulkan API;
  • Implement a kind of heap memory management, it is a large chunk of memory to be used and reused, over and over. Obviously, it can be a overkill and suffer same sort of problems of managed memory (as in Java JVM).

Am I missing something here ? What sort of applications would worth implementing vkAllocationCallbacks ?

回答1:

From the spec: "Since most memory allocations are off the critical path, this is not meant as a performance feature. Rather, this can be useful for certain embedded systems, for debugging purposes (e.g. putting a guard page after all host allocations), or for memory allocation logging."

With an embedded system, you might have grabbed all the memory right at the start, so you don't want the driver calling malloc because there might be nothing left in the tank. Guard pages and memory logging (for debug builds only) could be useful for the cautious/curious.

I read on a slide somewhere (can't remember where, sorry) that you definitely should not implement allocation callbacks that just feed through to malloc/realloc/free because you can generally assume that the drivers are doing a much better job than that (e.g. consolidating small allocations into pools).

I think that if you're not sure whether you ought to be implementing allocation callbacks, then you don't need to implement allocation callbacks and you don't need to worry that maybe you should have.

I think they're there for those specific use cases and for those who really want to be in control of everything.



回答2:

This answer is an attempt to clarify and correct some of the information in the other answers...

Whatever you do, don't use malloc/free/realloc for a Vulkan allocator. Vulkan can and probably does use aligned memory copies to move memory. Using unaligned allocations will cause memory corruption and bad things will happen. The corruptions may not show themselves in an obvious way either. Instead use the posix aligned_alloc/aligned_free/aligned_realloc. They can be found in 'malloc.h' on most systems. (under windows use _aligned_alloc,ect) The function aligned_realloc is not well know but it is there (and has been there for years). BTW The alloc's for my test card had alignment requests all over the place.

One thing that is non-obvious about passing an application specific allocator to Vulkan is that at least some Vulkan objects "remember" the allocator. For example I passed an allocator to the vkcreateinstance function and was very surprised to see messages coming from my allocator when allocating other objects (which I had passed a nullptr too for the allocator). It made sense when I stopped to think about since objects that interact with the vulkan instance may cause the instance to make additional allocations.

This all play's into Vulkan's performance since individual allocators could be written and tuned to a specific allocation task. Which could have an impact on process startup time. But more importantly, a "block" allocator that places instance allocations, for example, near each other could have an impact on overall performance since they could increase cache coherency. (Instead of having the allocations scattered all over memory) I realize that this kind of performance "enhancement" is very speculative, but a carefully tuned application could have an impact. (Not to mention the numerous other performance critical paths in Vulkan that deserve more attention.)

Whatever you do don't attempt to use the aligned_alloc class of functions as a "release" allocator as they have very poor performance compared to Vulkan's built-in allocator (on my test card). Even in simple programs there was a very noticable performance difference compared to Vulkan's allocator. (sorry i didn't collect any timing information but no way was I going to repeatedly sit through those lengthy startup times.)

When it comes to debugging, even something as simple as plain old printf's can be enlightening inside the allocators. It is also easy to add simple statistic's collecting. But expect a severe performance penalty. They can also be useful as debug hooks without writing a fancy debug allocator or adding yet another debug layer.

btw...my test card was nvidia using release drivers



回答3:

I implemented my own VkAllocatorCallback using plain C's malloc()/realloc()/free(). It is a naive implementation, and completely ignores the alignment parameter. Taking in account that malloc in 64 bits OS always return pointers with 16 (!) bytes alignment, which is pretty huge alignment, that would not be a problem in my tests. See Reference.

For information completeness, a 16 bytes alignment is also 8/4/2 bytes aligned.

My code is the following:

  /**
   * PFN_vkAllocationFunction implementation
   */
  void*  allocationFunction(void* pUserData, size_t  size,  size_t  alignment, VkSystemAllocationScope allocationScope){

    printf("pAllocator's allocationFunction: <%s>, size: %u, alignment: %u, allocationScope: %d",
        (USER_TYPE)pUserData, size, alignment, allocationScope);
   // the allocation itself - ignore alignment, for while
   void* ptr = malloc(size);//_aligned_malloc(size, alignment);
   memset(ptr, 0, size);
   printf(", return ptr* : 0x%p \n", ptr);
   return ptr;  
}

/**
 * The PFN_vkFreeFunction implementation
 */
void freeFunction(void*   pUserData, void*   pMemory){
    printf("pAllocator's freeFunction: <%s> ptr: 0x%p\n",
    (USER_TYPE)pUserData, pMemory);
    // now, the free operation !    
    free(pMemory);
 }

/**
 * The PFN_vkReallocationFunction implementation
 */
void* reallocationFunction(void*   pUserData,   void*   pOriginal,  size_t  size, size_t  alignment,  VkSystemAllocationScope allocationScope){
    printf("pAllocator's REallocationFunction: <%s>, size %u, alignment %u, allocationScope %d \n",
    (USER_TYPE)pUserData, size, alignment, allocationScope);       
    return realloc(pOriginal, size);
 }

/**
 * PFN_vkInternalAllocationNotification implementation
 */
void internalAllocationNotification(void*   pUserData,  size_t  size,   VkInternalAllocationType allocationType, VkSystemAllocationScope                     allocationScope){
  printf("pAllocator's internalAllocationNotification: <%s>, size %uz, alignment %uz, allocationType %uz, allocationScope %s \n",
    (USER_TYPE)pUserData, 
    size, 
    allocationType, 
    allocationScope);

}

/**
 * PFN_vkInternalFreeNotification implementation
 **/
void internalFreeNotification(void*   pUserData, size_t  size,  VkInternalAllocationType  allocationType, VkSystemAllocationScope                     allocationScope){
    printf("pAllocator's internalFreeNotification: <%s>, size %uz, alignment %uz, allocationType %d, allocationScope %s \n",
            (USER_TYPE)pUserData, size, allocationType, allocationScope);
}



 /**
  * Create Pallocator
  * @param info - String for tracking Allocator usage
  */
static VkAllocationCallbacks* createPAllocator(const char* info){
    VkAllocationCallbacks* m_allocator =     (VkAllocationCallbacks*)malloc(sizeof(VkAllocationCallbacks));
    memset(m_allocator, 0, sizeof(VkAllocationCallbacks));
    m_allocator->pUserData = (void*)info;
    m_allocator->pfnAllocation = (PFN_vkAllocationFunction)(&allocationFunction);
    m_allocator->pfnReallocation = (PFN_vkReallocationFunction)(&reallocationFunction);
    m_allocator->pfnFree = (PFN_vkFreeFunction)&freeFunction;
    m_allocator->pfnInternalAllocation = (PFN_vkInternalAllocationNotification)&internalAllocationNotification;
    m_allocator->pfnInternalFree = (PFN_vkInternalFreeNotification)&internalFreeNotification;
   // storePAllocator(m_allocator);
   return m_allocator;
  }

`

I used the Cube.c example, from VulkanSDK, to test my code and assumptions. Modified versions is available here GitHub

A sample of output:

pAllocator's allocationFunction: <Device>, size: 800, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061ECE40 
pAllocator's allocationFunction: <RenderPass>, size: 128, alignment: 8, allocationScope: 1, return ptr* : 0x000000000623FAB0 
pAllocator's allocationFunction: <ShaderModule>, size: 96, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F2C30 
pAllocator's allocationFunction: <ShaderModule>, size: 96, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F8790 
pAllocator's allocationFunction: <PipelineCache>, size: 152, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F2590 
pAllocator's allocationFunction: <Device>, size: 424, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F8EB0 
pAllocator's freeFunction: <ShaderModule> ptr: 0x00000000061F8790
pAllocator's freeFunction: <ShaderModule> ptr: 0x00000000061F2C30
pAllocator's allocationFunction: <Device>, size: 3448, alignment: 8, allocationScope: 1, return ptr* : 0x000000000624D260 
pAllocator's allocationFunction: <Device>, size: 3448, alignment: 8, allocationScope: 1, return ptr* : 0x0000000006249A80 

Conclusions:

  • The user implemented PFN_vkAllocationFunction, PFN_vkReallocationFunction,PFN_vkFreeFunction really does malloc/realoc/free operations in behalf of Vulkan. Not sure if they performs ALL allocations, as Vulkan may choose alloc/free some portions by itself.

  • The output provided by my implementations shows that typical alignment requested is 8 bytes, in my Win 7-64/NVidia. This shows that there is room for optimization, like as kind managed memory, where you grab a large chunk of memory and sub-allocate for your Vulkan app (a memory pool). It may* reduces memory usage (think 8 bytes before and up to 8 bytes after each alloc'ed block). It also may be faster, as malloc() call can last longer than a direct pointer to your own pool of memory already alloc'ed.

  • At least with my current Vulkan drivers, the PFN_vkInternalAllocationNotification and PFN_vkInternalFreeNotification doesn't run. Perhaps a bug in my NVidia drivers. I'll check in my AMD later.

  • The *pUserData is to be used to both debug info and/or management. Actually, you can used it to pass a C++ object, and play all required performance job over there. It's a sort of obvious info, but you can change it for each call or VkCreateXXX object.

  • You can use a single and generic VkAllocatorCallBack allocator for all application, but I guess that using a customised allocator may lead to better results. I my test, VkSemaphore creation shows a typical pattern on intense alloc/free of small chunks (72 bytes), which may be addressed with reuse of a previously chunk on memory, in a customised allocator. malloc()/free() already reuse memory when possible, but is tempting try to use our own memory manager, at least for short lived small blocks of memory.

  • Memory alignment maybe an issue to implement VkAllocationCallback (there is no _aligned_realoc function available, but only _aligned_malloc and _aligned_free). But only if Vulkan requests alignments bigger than malloc's default (8 bytes for x86, 16 for AMD64, etc. must check ARM defaults). But so far, seens Vulkan actually request memory with lower alignment than malloc() defaults, at least on 64bit OS's.

Final Thought:

You can live happy until the end of time just setting all VkAllocatorCallback* pAllocator you find as NULL ;) Possibly Vulkan's default allocator already does it all better than yourself.

BUT...

One of highlights of Vulkan benefits was the developer would be put in control of everything, including memory-management. Khronos presentation, slide 6