[Vulkan] Add performance test for GRU operator (#73126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73126
Added a performance test for the Vulkan GRU operator:
* Added a performance test `gru_op_perf` into `vulkan_perf_test.cpp`
* `--benchmark_filter` flag can be used to run only GRU perf tests:
```
adb shell "/data/local/tmp/vulkan_perf_test" --benchmark_filter=gru*
```
Test Plan:
Test command line:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test" --benchmark_filter=gru*
```
Test result:
```
Running /data/local/tmp/vulkan_perf_test
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------
gru_op_perf/N:384/C:384/H:2/iterations:1000/threads:1 16.7 ms 14.7 ms 1000
```
Reviewed By: SS-JIA
Differential Revision: D34355119
fbshipit-source-id: 049dc4b47938a04e395923e761e59304e8fa1f7d
(cherry picked from commit 39c8b7e4d7c408d867e0a08443199b0de1c5faf5)