onnxruntime
54086d8e - [QNN EP] Optimize Session Creation Time (#26628)

Commit
39 days ago
[QNN EP] Optimize Session Creation Time (#26628) ### Description <!-- Describe your changes. --> qnn::utils::TwoDimensionTranspose makes the bottleneck during session creation, because there is a double for loop memcpy. If the weight is quite large, it will be very slow. And it's called a total of 3 times by ReshapeGemmFusion. ``` QnnModel::ComposeGraph → ReshapeGemmFusion::AddToModelBuilder → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose QNNExecutionProvider::GetCapability → QNNExecutionProvider::GetSupportedNodes → ReshapeGemmFusion::IsSupported → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose (do QNN OP validation) QNNExecutionProvider::GetCapability → QNNExecutionProvider::GetSupportedNodes → onnxruntime::qnn::ReshapeGemmFusion::IsSupported → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose (do QNN OP validation) ``` This change avoid heavy memcpy by using a dummy tensor when only shape validation is required. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Below tested on a tensor with shape [2304, 700] | Function | TwoDimensionTranspose_1 | TwoDimensionTranspose_2 | TwoDimensionTranspose_3 | SessionCreationTime |---------------------------|--------------------------|--------------------------|--------------------------|--------------------------| | original | 88.39 ms | 57.80 ms | 53.09 m | 9.41871 s | | avoid 2 memcpy | 51.52 ms |12.00 m | 8.05 ms | 9.05975 s | --------- Co-authored-by: Kuan-Yu Lin <kuanyul@qti.qualcomm.com>
Author
Parents
Loading