[Vulkan] Thread-safe Vulkan backend for OSS (#69576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69576
Vulkan backend for OSS is also thread-safe by default:
* Removed `MAKE_VULKAN_THREADSAFE` preprocessor and if-conditions
Test Plan:
Test build on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test"
```
Test build on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64
```
Test result on Google Pixel 5:
```
//xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64
buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s)
Running /data/local/tmp/vulkan_perf_test
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 10.1 ms 1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 27.1 ms 5.86 ms 1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 58.5 ms 11.8 ms 1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.803 ms 5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 9.14 ms 0.857 ms 5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 32.1 ms 31.3 ms 3000
```
Test result on MacOS:
```
Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64
Run on (16 X 2400 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x1)
Load Average: 18.89, 29.61, 24.95
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 53.3 ms 39.6 ms 1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 28.0 ms 20.7 ms 1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 51.8 ms 38.7 ms 1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.76 ms 1.31 ms 5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.29 ms 1.11 ms 5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 49.2 ms 41.8 ms 3000
```
Reviewed By: SS-JIA
Differential Revision: D32933891
fbshipit-source-id: d8ebd5394771e1d79230c1f3aa8fbec4472b3197