onnxruntime
b49e3b17 - POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490)

Commit

239 days ago

POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490) POWER : Added a VSX-based implementation of MlasGemmQuantKernel optimized for the case when M = 1. Verified correctness using ONNX Runtime's built-in tests and onnxruntime_mlas_tests;no regressions observed. Evaluated performance using a Granite 8-bit quantized model and observed approximately 3-5% improvement in token generation speed. ### Description when M=1 then performed a multiplication using a VSX vector builtin vec_msum ### Motivation and Context To improve token generation performance for models with a batch size of 1

References

#25490 - POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1

Author

BODAPATIMAHESH

Parents

7493b8bf

onnxruntime b49e3b17 - POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490)

onnxruntime
b49e3b17 - POWER : Implement MlasGemmQuantKernel using VSX builtins for M = 1 (#25490)