pytorch
234f00e1 - [PyTorch][Vulkan] Add a matrix multiplication performance test binary and fix GPU latency measurement (#108266)

Commit View On GitHub

Commit

1 year ago

[PyTorch][Vulkan] Add a matrix multiplication performance test binary and fix GPU latency measurement (#108266) Summary: - Added a new matmul perf test binary as target `pt_vulkan_mm_perf_test_bin` - Also renamed the existing `vulkan_perf_test_bin` to `vulkan_conv_arithmetic_perf_test_bin` with associated source file name change - **Fixed the manual time benchmark measurement for both performance binaries, which was not tracking the correct opnames (e.g. checked for runtime of nonexistent "mm" instead of "vulkan.mm")** Test Plan: # pt_vulkan_mm_perf_test_bin - build the matrix multiplication performance test binary ``` ~/fbsource » buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 [...] BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid ``` - test on arm32 android device ``` ~/fbsource » adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid /data/local/tmp/ ~/fbsource » adb shell /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid ``` - output P817269023, excerpt below ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4336072 vulkan.nchw_to_image {250, 250, 1} 1106716 vulkan.nchw_to_image {1, 1, 1} 7228 vulkan.mm {250, 250, 1} 132570256 [...] vulkan.mm {250, 250, 1} 80492152 vulkan.image_to_nchw {500, 500, 1} 1420328 ------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------------------------- mm_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 91047 ms 143 ms 5 ``` # pt_vulkan_conv_arithmetic_perf_test_bin - build the convolution and arithmetic performance test binary ``` ~/fbsource » buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_conv_arithmetic_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 [...] BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_conv_arithmetic_perf_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_conv_arithmetic_perf_test_binAndroid__/pt_vulkan_conv_arithmetic_perf_test_binAndroid ``` - test on arm32 android device ``` ~/fbsource » adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_conv_arithmetic_perf_test_binAndroid__/pt_vulkan_conv_arithmetic_perf_test_binAndroid /data/local/tmp/ ~/fbsource » adb shell /data/local/tmp/pt_vulkan_conv_arithmetic_perf_test_binAndroid 2023-07-20T20:23:26+00:00 ``` - output P817267332, excerpt below ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.add {193, 221, 30} 39475696 vulkan.image_to_nchw {193, 221, 30} 13463424 vulkan.add {193, 221, 30} 72950176 vulkan.image_to_nchw {193, 221, 30} 17792684 [...] vulkan.add {193, 221, 30} 72986368 vulkan.image_to_nchw {193, 221, 30} 15921672 ---------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------------------------- add_op_benchmark/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:1 73242 ms 602 ms 100 libc++abi: terminating due to uncaught exception of type c10::Error: Copy of vulkan quantized tensors to cpu is currently disabled! ``` Reviewed By: yipjustin Differential Revision: D48798710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108266 Approved by: https://github.com/manuelcandales

Author

liuk22

Committer

pytorchmergebot

Parents

8f028845

pytorch 234f00e1 - [PyTorch][Vulkan] Add a matrix multiplication performance test binary and fix GPU latency measurement (#108266)

Commit

pytorch
234f00e1 - [PyTorch][Vulkan] Add a matrix multiplication performance test binary and fix GPU latency measurement (#108266)