pytorch
64ac4288 - [vulkan] Adaptive local work group size (#61170)

Commit
3 years ago
[vulkan] Adaptive local work group size (#61170) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61170 Instead of using a fixed local work group size of {4,4,4}, adjust the size based on the global size in order to minimize the number of inactive invocations. ## Perf improvements from this change On aloha portal devices, in conjunction with the below diff that tweaks the conv2d_pw shader to calculate a 4x4 output, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724591 fbshipit-source-id: ede896300b2be1a9578e492cb870121012886aa7
Author
Parents
Loading