onnxruntime
cbdd0bb7
- QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM (#16851)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM (#16851) ### Description Update QAttention calling into MatMulIntToFloat instead of Dequantize+GEMM to enable more metacommand path.
References
#16851 - QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM
#18530 - Add TryConvertTensorToBroadcastScalarfor QAttention and MatMulIntToFloat
Author
zhangxiang1993
Parents
c19e4c02
Loading