[vulkan] Add buffer to texture and texture to buffer copies (#82799)
This diff adds functions to transfer data to GPU textures not through the `nchw_to_image` and `image_to_nchw` shaders but with host side functions that convert CPU tensors between NCHW to NC4HW formats, allowing data to be copied directly into and out from image textures via `vkCmdCopy*` API calls. These functions can be used when loading tensors from memory when loading a model, as it has the following benefits:
1. No need to construct a compute pipeline to transfer data to/from the GPU. This saves time when loading Vulkan models/performing first inference.
2. `vkCmdCopy*` is faster than compute shaders. However, due to the need to rearrange data between NCHW and NC4HW formats, it is still faster to use compute shaders to transfer input and output tensors. However, for weight tensors that can be serialized directly in NC4HW format, it is more beneficial to copy data directly.
3. It is much clearer from the code how data is represented on the GPU
Note that this change necessitated changes to how the `StagingBuffer` class worked (now called `StorageBuffer`. This is because the data size and alignment of data copied to and from an image texture is dependent on the image format of the texture.
If an image texture is `VK_FORMAT_R16G16B16A16_SFLOAT` the buffer copied from the texture can be interpreted as a contiguous array of 16 bit values. If the image texture is `VK_FORMAT_R32G32B32A32_SFLOAT` then the data buffer is an array of 32 bit values. Previously, `StagingBuffer` was constructed under the assumption that an array of 32 bit floats would be used to receive the data from the texture. This causes isses when `USE_VULKAN_FP16_INFERENCE` is turned on, which forces the `VK_FORMAT_R16G16B16A16_SFLOAT` image format to be used. Therefore, the construction of `StorageBuffer` now requires a data type and number of elements to be specified in its constructor to ensure proper sizing of the data buffer.
Differential Revision: [D38264300](https://our.internmc.facebook.com/intern/diff/D38264300/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82799
Approved by: https://github.com/manuelcandales