onnxruntime
ecc358f0 - [QNN EP] Add LPBQ encoding support for MatMul operator (#25539)

Commit

227 days ago

[QNN EP] Add LPBQ encoding support for MatMul operator (#25539) ### Description - LPBQ encoding is Qualcomm's alternative quantization encoding format for Block Quantization - Add translation logic to read LPBQ pattern on MatMul weights in an QDQ ONNX model exported by AIMET Quantizer - Prepare the corresponding QNN Quantization param for applying LowPowerBlockQuantization on MatMul weights - Apply LPBQ Fusions only for NPU Backend as currently only NPU backend supports LPBQ encoding format ### Motivation and Context - This requires accelerate accuracy sensitive large language models like Phi-3.5 efficiently on Qualcomm's NPU accelerator.

References

#25539 - [QNN EP] Add LPBQ encoding support for MatMul operator

Author

quic-tirupath

Parents

ff83f534

onnxruntime ecc358f0 - [QNN EP] Add LPBQ encoding support for MatMul operator (#25539)

onnxruntime
ecc358f0 - [QNN EP] Add LPBQ encoding support for MatMul operator (#25539)