[Pytorch][Vulkan] sum.dim_IntList (#105612)
Summary:
Add Vulkan support for [sum](https://pytorch.org/docs/stable/generated/torch.sum.html).dim_IntList
[sum.dim_IntList](https://www.internalfb.com/code/fbsource/[49b7951b7eb6]/xplat/caffe2/aten/src/ATen/native/native_functions.yaml?lines=5466):
```
func: sum.dim_IntList(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None)
```
Some explanation
For each pos
- Iterate over the out_texel and summed dimension
- For H,W; rearrange pos.x, pos.y
- For C,H,W;
When CHW are summed, batch moves into channel
The src N is determined by pos.z * 4 + out_index
Follow up:
Add support for `keepdim=true`
```
if keepdim is true, the output tensor is of the same size as input except in the dimension(s) dim, where it is of size 1
otherwise, the dim is squeezed, result in the output tensor having 1 fewer dimension/s.
```
Add support for [sum](https://www.internalfb.com/code/fbsource/[49b7951b7eb6]/xplat/caffe2/aten/src/ATen/native/native_functions.yaml?lines=5457)
```
func: sum(Tensor self, *, ScalarType? dtype=None) -> Tensor
```
Test Plan:
New tests:
```
lfq@lfq-mbp fbsource % buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*.sum*"
Downloaded 0/53 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 47.4 sec (100%) 536/536 jobs, 8/536 updated
Total time: 47.5 sec
BUILD SUCCEEDED
Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc
Note: Google Test filter = *.sum*
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from VulkanAPITest
[ RUN ] VulkanAPITest.sum_2d
[ OK ] VulkanAPITest.sum_2d (426 ms)
[ RUN ] VulkanAPITest.sum_3d
[ OK ] VulkanAPITest.sum_3d (2 ms)
[ RUN ] VulkanAPITest.sum_4d
[ OK ] VulkanAPITest.sum_4d (3 ms)
[ RUN ] VulkanAPITest.sum_3d_combined
[ OK ] VulkanAPITest.sum_3d_combined (1 ms)
[ RUN ] VulkanAPITest.sum_4d_combined
[ OK ] VulkanAPITest.sum_4d_combined (5 ms)
[----------] 5 tests from VulkanAPITest (437 ms total)
[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (438 ms total)
[ PASSED ] 5 tests.
```
clang-format on Sum.cpp and sum_dim.glsl
Differential Revision: D47580428
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105612
Approved by: https://github.com/SS-JIA