[Vulkan] cat operator for channel dimension (#66669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66669
Implemented `cat` operator for channel dimension
**Facts:**
* texture coordinate: x(width), y(height), z(depth)
* input x, y, z -> no change
* out x, y -> no change
* out z and index i, j only matter
**Equations:**
batch_size = bt0 (or bt1 or bt2 or ...) = # of batch for tensor i
ch_size = ch0 (or ch1 or ch2 or ...) = # of channels for tensor i
ch_interval = ch0 + ch1 + ch2 + ... = total # of channels for all tensors
ch_size_allprior = ch0 (or ch0+ch1 or ch0+ch1+ch2 or ...) = # of channels for tensor 0 to i-1 where pos.z = d (input)
i = index of input texel = vec4[i] of texel at posIn(x,y,z) on input texture
j = index of output texel = vec4[j] of texel at posOut(x',y',z') on input texture
posIn[i] = {x,y,z} at ith index of vec4
src_index = posIn.z * 4 + i
dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior
d = posOut.z = int(dst_index / 4)
j = (dst_index % 4)
posOut[j] = {posIn.x, posIn.y, d} at jth index of vec4
**Shader pseudo code:**
posOut = posIn;
for (i = 0; i < 4; ++i) {
src_index = posIn.z * 4 + i;
if (src_index >= ch_size * batch_size) break; // out of range
dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior;
posOut.z = int(dst_index / 4);
j = (dst_index % 4);
uOutput[j] = uInput[i]
}
Test Plan:
Test build on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Test result:
```
[ RUN ] VulkanAPITest.cat_dim1_samefeature_success
[ OK ] VulkanAPITest.cat_dim1_samefeature_success (101 ms)
[ RUN ] VulkanAPITest.cat_dim1_difffeature_success
[ OK ] VulkanAPITest.cat_dim1_difffeature_success (81 ms)
[ RUN ] VulkanAPITest.cat_dim1_texture2d_success
[ OK ] VulkanAPITest.cat_dim1_texture2d_success (2 ms)
[ RUN ] VulkanAPITest.cat_dim1_singledepth_success
[ OK ] VulkanAPITest.cat_dim1_singledepth_success (6 ms)
[ RUN ] VulkanAPITest.cat_dim1_singletensor_success
[ OK ] VulkanAPITest.cat_dim1_singletensor_success (21 ms)
[ RUN ] VulkanAPITest.cat_dim1_twotensors_success
[ OK ] VulkanAPITest.cat_dim1_twotensors_success (53 ms)
[ RUN ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success
[ OK ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success (17 ms)
[ RUN ] VulkanAPITest.cat_dim2_sameheight_success
[ OK ] VulkanAPITest.cat_dim2_sameheight_success (83 ms)
[ RUN ] VulkanAPITest.cat_dim2_diffheight_success
[ OK ] VulkanAPITest.cat_dim2_diffheight_success (86 ms)
[ RUN ] VulkanAPITest.cat_dim2_singledepth_success
[ OK ] VulkanAPITest.cat_dim2_singledepth_success (5 ms)
[ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions
[ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (82 ms)
```
Reviewed By: SS-JIA
Differential Revision: D31593623
fbshipit-source-id: e52dc57985e3f0bb9b20313d4fcc7248a436e863