onnxruntime
ecc358f0 - [QNN EP] Add LPBQ encoding support for MatMul operator (#25539)

Commit
199 days ago
[QNN EP] Add LPBQ encoding support for MatMul operator (#25539) ### Description - LPBQ encoding is Qualcomm's alternative quantization encoding format for Block Quantization - Add translation logic to read LPBQ pattern on MatMul weights in an QDQ ONNX model exported by AIMET Quantizer - Prepare the corresponding QNN Quantization param for applying LowPowerBlockQuantization on MatMul weights - Apply LPBQ Fusions only for NPU Backend as currently only NPU backend supports LPBQ encoding format ### Motivation and Context - This requires accelerate accuracy sensitive large language models like Phi-3.5 efficiently on Qualcomm's NPU accelerator.
Author
Parents
Loading