8.3 C
New York
Tuesday, March 25, 2025

Vulkan 1.4: Sooner app masses, much less stutter and fewer Reminiscence Utilization | by Shahbaz Youssefi | Android Builders | Dec, 2024


Host Picture Copy is a recreation changer for Android

Vulkan 1.4 was launched just lately, and with it comes a big function for Android: Host Picture Copy, primarily based on VK_EXT_host_image_copy.

We’ve got beforehand written about this extension in this Khronos weblog put up, explaining the technical particulars of utilizing this extension. This extension is especially helpful for Android video games as we’ll see on this put up.

Briefly, Host Picture Copy is a Vulkan function that enables the applying to switch picture knowledge utilizing the CPU as an alternative of the GPU. This function is especially helpful on UMA units (similar to typical Android units), however could place restrictions on photographs. Specifically, most drivers disable framebuffer compression for host-copyable photographs which can be in any other case renderable. Learn on to study the place this function actually shines.

To place issues in context, Host Picture Copy is one method to asynchronously switch picture knowledge. The opposite is utilizing a devoted switch queue (with VK_QUEUE_TRANSFER_BIT, and with out VK_QUEUE_GRAPHICS_BIT). In Vulkan 1.4, not less than one is required. You’ll be able to anticipate that the overwhelming majority of Android units delivery with Vulkan 1.4 will implement Host Picture Copy, and implement it optimally for compressed codecs. That’s, Vulkan requires optimalDeviceAccess to be true for these codecs.

Because it occurs, texture knowledge constitutes the most important quantity of picture knowledge in typical video games, they usually use compressed codecs!

First, let’s see how Host Picture Copy differs from doing knowledge copies on the GPU, similar to with vkCmdCopyBufferToImage2.

With out Host Picture Copy, the trail from texture knowledge loaded from disk to a picture goes by way of a Vulkan buffer:

  • A Vulkan buffer is allotted, taking on about as a lot reminiscence because the Vulkan picture does.
  • The feel knowledge is copied (within the model of memcpy) to the buffer after mapping it by the CPU.
  • vkCmdCopyBufferToImage2 is recorded within the command buffer that’s later submitted.
  • The feel knowledge is copied to the picture by the GPU.
  • The buffer reminiscence is freed a number of frames later as soon as the applying is aware of the GPU copy is completed.

Within the above, the feel knowledge is copied twice, and for a number of frames the quantity of reminiscence allotted for the feel knowledge is twice the dimensions of the picture. There are two additional issues to notice right here:

  • The copy on the CPU is as quick as it could possibly get, as a result of it’s successfully memcpy.
  • The copy on the GPU effectively reorders the information to match the bodily format of the picture (a.okay.a. format swizzling), however it occurs on the graphics queue (assuming no devoted switch queues), interfering with rendering in the identical body.

With Host Picture Copy as an alternative, the copy is completed just by calling vkCopyMemoryToImage. On this case, the CPU does the copy and format swizzling. This copy is slower than every of the copies above, as a result of the CPU is just not as environment friendly in reordering the information, however:

  • The copy, even when slower, is barely carried out as soon as
  • The copy doesn’t intrude with ongoing GPU work
  • There isn’t a further reminiscence allotted for texture knowledge

FYI, the explanation this extension has much less utility on NUMA units, similar to units with devoted GPUs (and devoted reminiscence) is that the CPU could not have entry to the whole GPU reminiscence or entry could also be too sluggish, which can restrict the quantity of reminiscence that could possibly be used for host-copyable textures, or the copy could also be prohibitively costly. The identicalMemoryTypeRequirements property signifies whether or not Host Picture Copy limits entry to GPU reminiscence or not.

Within the following, two eventualities are offered the place Host Picture Copy can considerably enhance a recreation with the above properties in thoughts.

Eradicating stutter throughout texture knowledge streaming whereas concurrently halving reminiscence utilization sounds too good to be true, however that’s precisely the form of factor Host Picture Copy allows.

To set the scene: think about an open-world recreation, you might be nearing a brand new space and many new textures have to be loaded from persistent storage. You might be cruising at 60 FPS; it could be a disgrace if that drops to twenty FPS or the sport crashes with Out of Reminiscence.

Avoiding such stutters with Host Picture Copy is quite simple.

The applying can use a CPU thread to stream in texture knowledge immediately into new photographs utilizing Host Picture Copy. The GPU would proceed to render frames of constant complexity as earlier than, sustaining FPS, and the reminiscence enhance is as minimal as it could possibly get. Don’t neglect to reminiscence map the feel knowledge file as an alternative of studying right into a CPU buffer first for much more effectivity!

Can we apply the identical technique for when the sport is being loaded within the first place? Positive, use a number of CPU threads to repeat texture knowledge immediately into photographs. On condition that the CPU copy is slower attributable to format swizzling, load occasions could probably not be any quicker, however not less than the reminiscence utilization is halved!

However Host Picture Copy has a secret manner of creating this a lot quicker — as quick as memcpy! Principally the CPU copy can be simply as environment friendly because the CPU copy within the GPU Switch state of affairs, the GPU copy is gone, the GPU buffer is gone, it’s all goodness and no downsides. The hot button is VK_HOST_IMAGE_COPY_MEMCPY.

This flag is trivial, it merely tells the CPU not to do format swizzling. So the feel knowledge being copied to the picture is assumed to be pre-swizzled, and the copy is solely memcpy. However because the format swizzling of photographs on numerous units is just not public data, how is this handy?

The reply is in image-to-memory copies with the identical flag, that’s readback of swizzled picture knowledge with out undoing the format swizzling. Many high-fidelity AAA Android video games obtain huge packages of texture knowledge on the primary run of the sport. Take the next algorithm:

  • Obtain texture knowledge
  • Use a short lived Vulkan picture and name vkCopyMemoryToImage -> the CPU does format swizzling
  • Learn again the picture contents with vkCopyImageToMemory with the VK_HOST_IMAGE_COPY_MEMCPY flag -> the returned knowledge is pre-swizzled for this specific machine/driver
  • Retailer solely the pre-swizzled knowledge to persistent storage, not the unique texture knowledge, to reduce storage footprint

The following time the sport runs, it could possibly merely use vkCopyMemoryToImage with VK_HOST_IMAGE_COPY_MEMCPY to copy the pre-swizzled knowledge into the pictures as quick as a easy learn of the file contents can be. This additionally occurs to optimize the streaming state of affairs above!

Solely gotcha is that driver updates may change the format swizzling of photographs. The sport must test that optimalTilingLayoutUUID is unchanged because the pre-swizzled texture knowledge was cached, and redo the above if it ever adjustments. Luckily, adjustments to the format swizzle are uncommon. In observe, the sport is unlikely to ever have to redownload or reprocess its texture knowledge.

The Host Picture Copy function as conditionally required by Vulkan 1.4, and unconditionally required by Android 16 for brand spanking new units, is a recreation changer for video games on Android. On this put up we checked out a number of straightforward however vital wins utilizing this performance, however there are others, notably asynchronous picture reminiscence defragmentation. Absolutely, your ingenuity will result in different optimizations which can be made doable by this function.

You’ll want to take a look at this put up on the Khronos weblog for extra technical particulars across the utilization of this performance. As this performance begins to turn into prevalent on Android telephones, Vulkan video games will probably be . Don’t miss out!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles