GPUltra | Vulkan-CUDA memory interoperability

Link copied to clipboard

February 11, 2021·5 min read

Vulkan-CUDA memory interoperability

In this post we are describing how the same area of GPU memory can be used in CUDA and Vulkan at the same time. In a typical scenario data is calculated using CUDA and then the data is visualized using Vulkan. In GPUltra we render volumes and process animation frames using CUDA. In Vulkan we render geometric objects and draw user interface elements. There is another reason for which we use the interoperability and why you might want use it as well. We will talk more about it at the end of this post.

In order to use the same memory area in CUDA and Vulkan, first we need to export the memory from Vulkan and import it in CUDA. This is achieved with the following steps.

A Vulkan instance and device are created with proper extensions enabled.
Vulkan memory is allocated with an export option.
A descriptor or memory handle (depending on the operating system) is created.
The handle/descriptor is used to import and map the memory in CUDA.

Instance and device extensions

In order to be able to retrieve a descriptor or handle to a memory location, a Vulkan instance and device must be created with the following extensions enabled. The instance extension is VK_KHR_external_memory_capabilities while the device extensions are:

VK_KHR_external_memory,
VK_KHR_external_memory_fd (enabled on Linux),
VK_KHR_external_memory_win32 (enabled on Windows).

The Vulkan header files contain definitions with the extension names:

VK_KHR_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
VK_KHR_EXTERNAL_MEMORY_EXTENSION_NAME,
VK_KHR_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
VK_KHR_EXTERNAL_MEMORY_WIN32_EXTENSION_NAME (header vulkan_win32.h).

Vulkan memory with export option

Once the extensions are enabled, memory can be allocated with the export option. This is done by filling a structure called VkExportMemoryAllocateInfoKHR. The values of the structure depend on the operating system. On Linux the structure just says use file descriptors:

exportInfo->pNext = (const void*)0;
exportInfo->handleTypes = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT;

On Windows handles are used instead of file descriptors. Moreover, we must explicitly allow access to the memory handle. It just requires including the header file vulkan/vulkan_win32.h and calling a couple of functions which are shown in the code below.

#include <vulkan/vulkan_win32.h>

SECURITY_DESCRIPTOR* securityDescriptor =
    malloc(SECURITY_DESCRIPTOR_MIN_LENGTH + 2 * sizeof(void**));
if (InitializeSecurityDescriptor(securityDescriptor,SECURITY_DESCRIPTOR_REVISION) == 0) {
    // handle error
}

PSID* sid = (PSID*)((PBYTE)securityDescriptor + SECURITY_DESCRIPTOR_MIN_LENGTH);
SID_IDENTIFIER_AUTHORITY sid_identifier = SECURITY_WORLD_SID_AUTHORITY;
if (AllocateAndInitializeSid(&sid_identifier,1,SECURITY_WORLD_RID,
    0,0,0,0,0,0,0,sid) == 0) {
    // handle error
}

EXPLICIT_ACCESS explicitAccess;
memset((void*)&explicitAccess,0,sizeof(explicitAccess));
explicitAccess.grfAccessPermissions = STANDARD_RIGHTS_ALL | SPECIFIC_RIGHTS_ALL;
explicitAccess.grfAccessMode = SET_ACCESS;
explicitAccess.grfInheritance = INHERIT_ONLY;
explicitAccess.Trustee.TrusteeForm = TRUSTEE_IS_SID;
explicitAccess.Trustee.TrusteeType = TRUSTEE_IS_WELL_KNOWN_GROUP;
explicitAccess.Trustee.ptstrName = (LPTSTR)*sid;

PACL* acl = (PACL*)((PBYTE)sid + sizeof(PSID*));
if (SetEntriesInAcl(1,&explicitAccess,nullptr,acl) != ERROR_SUCCESS) {
    // handle error
}
if (SetSecurityDescriptorDacl(securityDescriptor,TRUE,*acl,FALSE) == 0) {
    // handle error
}

SECURITY_ATTRIBUTES securityAttributes
securityAttributes.nLength = sizeof(SECURITY_ATTRIBUTES);
securityAttributes.lpSecurityDescriptor = securityDescriptor;
securityAttributes.bInheritHandle = TRUE;

VkExportMemoryWin32HandleInfoKHR handleInfo;
handleInfo.sType = VK_STRUCTURE_TYPE_EXPORT_MEMORY_WIN32_HANDLE_INFO_KHR
handleInfo.pAttributes = &securityAttributes;
handleInfo.dwAccess = DXGI_SHARED_RESOURCE_READ | DXGI_SHARED_RESOURCE_WRITE

exportInfo->handleTypes = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_BIT;
exportInfo->pNext = &handleInfo;

The export-memory structure is set to pNext of the VkMemoryAllocateInfo structure. Once the structures are set, the memory can be allocated with vkAllocateMemory. At this point, the memory can be used by CUDA. However, first a descriptor or handler, which is passed to CUDA, must be obtained.

Memory handle/descriptor

In order to import a block of Vulkan memory in CUDA, a descriptor (on Linux) or handler (on Windows) must be obtained. This requires the pointer to the memory (VkDeviceMemory) returned by vkAllocateMemory. On Linux it is achieved as listed below:

int descriptor;

PFN_vkGetMemoryFdKHR vkGetMemoryFdKHR = 
    (PFN_vkGetMemoryFdKHR)instance.getProcAddress("vkGetMemoryFdKHR");
if (vkGetMemoryFdKHR == nullptr) {
    // handle missing function
}

VkMemoryGetFdInfoKHR info = { VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR };
info.memory = vkDeviceMemory;
info.handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;

if (vkGetMemoryFdKHR(vkDevice,&info,&descriptor) != VK_SUCCESS) {
    // handle error
}

On Windows the corresponding code is:

HANDLE handle;

PFN_vkGetMemoryWin32HandleKHR vkGetMemoryWin32HandleKHR = 
    (PFN_vkGetMemoryWin32HandleKHR)instance.getProcAddress("vkGetMemoryWin32HandleKHR");
if (vkGetMemoryWin32HandleKHR == nullptr) {
    // handle missing function
}

VkMemoryGetWin32HandleInfoKHR info =
    { VK_STRUCTURE_TYPE_MEMORY_GET_WIN32_HANDLE_INFO_KHR };
info.memory = vkDeviceMemory;
info.handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_BIT;

if (vkGetMemoryWin32HandleKHR(vkDevice,&info,&handle) != VK_SUCCESS) {
    // handle error
}

Both of the functions vkGetMemoryFdKHR and vkGetMemoryWin32HandleKHR are provided by extensions. This means that the functions might not be available and such a situation must be handled.

CUDA external memory

CUDA external memory is used as normal CUDA memory via a pointer to it. Creating external memory requires a descriptor/handle to the exported memory and the size of it. It takes two steps:

The memory is imported by the function cuImportExternalMemory.
The memory is mapped using the function cuExternalMemoryGetMappedBuffer.

The first step includes filling the structure CUDA_EXTERNAL_MEMORY_HANDLE_DESC and calling cuImportExternalMemory. The fields set in the structure depend on the operating system. Importing the memory in Linux and Windows is shown below.

CUexternalMemory externalMemory;
CUDA_EXTERNAL_MEMORY_HANDLE_DESC desc;

desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD;
desc.handle.fd = descriptor;
desc.size = size;

if (cuImportExternalMemory(&externalMemory,&desc) != CUDA_SUCCESS) {
    // handle error
}

CUexternalMemory externalMemory;
CUDA_EXTERNAL_MEMORY_HANDLE_DESC desc;

desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32;
desc.handle.win32.handle = handle;
desc.size = size;

if (cuImportExternalMemory(&externalMemory,&desc) != CUDA_SUCCESS) {
    // handle error
}

Once the memory is imported, it must be mapped to finally obtain the CUDA pointer to the memory. First, the structure CUDA_EXTERNAL_MEMORY_BUFFER_DESC is filled with the offset and size. Then the function cuExternalMemoryGetMappedBuffer is called.

CUdeviceptr pointer;

CUDA_EXTERNAL_MEMORY_BUFFER_DESC desc;
desc.flags = 0; // must be zero
desc.offset = offset;
desc.size = size;

if (cuExternalMemoryGetMappedBuffer(&pointer,externalMemory,&desc) != CUDA_SUCCESS) {
    // handle error
}

The result pointer is the pointer which can now be used as a normal CUDA pointer. It can be read from and/or written to and all the changes will be reflected in Vulkan - and conversely, all changes made in Vulkan will be visible in CUDA.

Conclusions

As we have shown the memory interoperability between CUDA and Vulkan is easy and takes only a couple of steps. Accessing the same block of memory from two places can lead to problems if the access is not synchronized. There is a counterpart to memory interoperability which allows solving of the access issue - semaphore interoperability. However, GPUltra utilizes memory interoperability using synchronization mechanisms provided by the operating system. We don't use the Vulkan-CUDA semaphore interoperability. For this reason, we are not writing about it in this post as we do not have knowledge about it.

GPUltra heavily depends on copying framebuffers for further processing and this way it copies considerable blocks of memory. Having Vulkan and CUDA, we can use either of them to copy data. Naturally, it seems that both ways should have the same performance as both run on the same hardware. However, we ran a simple test to compare the performance of copying memory using Vulkan and CUDA. We copied the same amount of data and the results were really surprising. On average, copying in CUDA is around 30 times faster than in Vulkan.