onnxruntime
4cb23b02 - Improvements to the INT8 GEMM portion of the code for Power (#20595)

Commit
1 year ago
Improvements to the INT8 GEMM portion of the code for Power (#20595) These are changes to improve GEMM portion of the code for Power. There are 2 main code changes : 1) Changing a function to a template parameter so that operations that add/sub zero are eliminated at compile time. Plus reuse a vector that has the mask instead of rebuilding each time. 2) Add processing 16 columns at a time in MlasGemmQuantCopyPackB8x8 - this should reduce potential page faults by a factor of 4 and also be faster. 3) Unroll MlasQgemmStoreVectorMMA and vectorize other variables.
Author
Parents
Loading