pytorch
bab1ea85 - [Vulkan] Optimize LSTM operator with pre-packing (#79702)

Commit
2 years ago
[Vulkan] Optimize LSTM operator with pre-packing (#79702) Summary: Optimized LSTM operator by using pre-packing for weights and biases in the Vulkan GPU backend - The weights and biases are always on the CPU side by design. - The packed and unpacked data are stored in a VulkanOpContext - Ops: - `at::native::vulkan::ops::create_lstm_context`: Creates a VulkanOpContext object with the packed and unpacked data, and returns a pointer to it. - `at::native::vulkan::ops::run_lstm_context`: Takes in the three input vulkan tensors (input sequence, initial hidden state and initial cell state) and a pointer to the context, and runs the LSTM operation. - Registered the ops in [Register.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Register.cpp). - Rewrote the subgraph function of LSTM in [vulkan_rewrite.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/vulkan_rewrite.cpp) so that `create_lstm_context` and `run_lstm_context` can be executed instead in the Vulkan GPU backend. - Added new test for the LSTM pre-packing and run ops: `lstm_prepack_success` Test Plan: buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac Reviewed By: SS-JIA Differential Revision: D37052597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79702 Approved by: https://github.com/SS-JIA
Committer
Parents
Loading