[Vulkan] Implement clone operator (#69551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69551
Implemented `clone` operator in the Vulkan backend:
* Supports only <= 4D tensors.
* Internal name is `aten::clone`.
* Vulkan `clone` operator accepts only `c10::MemoryFormat::Preserve` and `c10::MemoryFormat::Contiguous` for the argument `c10::optional<c10::MemoryFormat> optional_memory_format`.
* Throws an exception if the `optional_memory_format argument` is neither `MemoryFormat::Preserve` nor `MemoryFormat::Contiguous`
* CPU implementation: [/aten/src/ATen/native/TensorFactories.cpp::clone()](https://github.com/pytorch/pytorch/blob/3e45739543fbce471fc4ed26ff079efe170de0f1/aten/src/ATen/native/TensorFactories.cpp#L1415)
* MKL-DNN implementation: [/aten/src/ATen/native/mkldnn/TensorShape.cpp::mkldnn_clone()](https://github.com/pytorch/pytorch/blob/3e45739543fbce471fc4ed26ff079efe170de0f1/aten/src/ATen/native/mkldnn/TensorShape.cpp#L58)
* `self.copy_(src)` calls `copy_()` for Vulkan to Vulkan copy operation
```
vTensor::copy_()
vTensor::copy_() X -> Vulkan
vTensor::copy_() CPU -> Vulkan
vTensor::clone()
vTensor::clone() -> MemoryFormat::Preserve
vTensor::clone() -> MemoryFormat::Preserve -> self = at::empty_like(src)
vTensor::clone() self.copy_(src); -> BEFORE
vTensor::copy_()
vTensor::copy_() X -> Vulkan
vTensor::copy_() Vulkan -> Vulkan
vTensor::clone() self.copy_(src); -> AFTER
vTensor::copy_()
vTensor::copy_() Vulkan -> X
vTensor::copy_() Vulkan -> CPU
```
* References:
* Function `torch.clone` in PyTorch documentation: https://pytorch.org/docs/stable/generated/torch.clone.html
* Pytorch preferred way to copy a tensor: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor
* `torch.memory_format`: https://pytorch.org/docs/stable/tensor_attributes.html?highlight=memory_format#torch.torch.memory_format
* `c10::MemoryFormat` definition in [/c10/core/MemoryFormat.h](https://github.com/pytorch/pytorch/blob/3e45739543fbce471fc4ed26ff079efe170de0f1/c10/core/MemoryFormat.h#L28)
Test Plan:
Build & test on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Build & test on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```
Test result on Android (Google Pixel 5):
```
[ RUN ] VulkanAPITest.clone_success
[ OK ] VulkanAPITest.clone_success (5 ms)
[ RUN ] VulkanAPITest.clone_invalidinputs_exceptions
[ OK ] VulkanAPITest.clone_invalidinputs_exceptions (1 ms)
```
Test result on MacOS:
```
[ RUN ] VulkanAPITest.clone_success
[ OK ] VulkanAPITest.clone_success (19 ms)
[ RUN ] VulkanAPITest.clone_invalidinputs_exceptions
[ OK ] VulkanAPITest.clone_invalidinputs_exceptions (2 ms)
```
Reviewed By: SS-JIA
Differential Revision: D32923535
fbshipit-source-id: ea29792e1b0080cbbc1c8c7e8bf2beffad9b5c0d