onnxruntime
b49e3b17 - POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490)

Commit
239 days ago
POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490) POWER : Added a VSX-based implementation of MlasGemmQuantKernel optimized for the case when M = 1. Verified correctness using ONNX Runtime's built-in tests and onnxruntime_mlas_tests;no regressions observed. Evaluated performance using a Granite 8-bit quantized model and observed approximately 3-5% improvement in token generation speed. ### Description when M=1 then performed a multiplication using a VSX vector builtin vec_msum ### Motivation and Context To improve token generation performance for models with a batch size of 1
Parents
Loading