onnxruntime
155e22d1 - MLAS: fuse float output into quantized GEMM (#4215)

Commit

5 years ago

MLAS: fuse float output into quantized GEMM (#4215) Add more variants of MlasGemm that do a u8x8 GEMM with the output type as float. This fuses the common sequence of MatMulInteger + Cast + Mul(OutputScale) + optional Add(BiasVector).

References

#4215 - MLAS: fuse float output into quantized GEMM

Author

tracysh

Parents

2e3607c7

onnxruntime 155e22d1 - MLAS: fuse float output into quantized GEMM (#4215)

onnxruntime
155e22d1 - MLAS: fuse float output into quantized GEMM (#4215)