[Vulkan] Implement GLU operator (#80910)
Summary:
Implemented GLU operator for the Vulkan backend.
Special case implementation:
- Input tensor must be 4-dim, i.e. [N, C, H, W].
- C must be a multiple of 8 (number of channels of output tensor must be a multiple of 4).
- dim must be 1.
References
- PyTorch Docs > torch.nn > [GLU](https://pytorch.org/docs/stable/generated/torch.nn.GLU.html)
Test Plan:
Added test case to `/xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp`
On Mac:
```
buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac
```
On Android:
```
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Differential Revision: D37625389
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80910
Approved by: https://github.com/SS-JIA