[Vulkan] Implement Batchnorm operator (#80510)
Summary:
Implemented BatchNorm operator for the Vulkan backend.
Special case implementation:
- Input tensor must be 4-dim, i.e. [N, C, H, W].
- C must be a multiple of 4.
- It expects that weight and bias be defined.
- Only supports evaluation mode. Therefore, running_mean and running_var must also be defined.
References
- PyTorch Docs > torch.nn > [BatchNorm2d](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)
Test Plan:
Added 3 test cases to `/xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp`
On Mac:
```
buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac
```
On Android:
```
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Differential Revision: D37519389
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80510
Approved by: https://github.com/SS-JIA