onnxruntime
QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM
#16851

Merged

QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM #16851

zhangxiang1993 merged 3 commits into DmlPrototype from user/xianz/QAttention_v2

PatriceVignola commented on 2023-07-25

jeffbloo force-pushed the DmlPrototype branch from eb6222b2 to 0790b051 2 years ago

jeffbloo requested a review 2 years ago

QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM

23ad7917

rebase DmlPrototype

98bd750b

zhangxiang1993 force pushed from 56199c76 to 98bd750b 2 years ago

consistent style

f2aff002

PatriceVignola approved these changes on 2023-07-26

zhangxiang1993 merged cbdd0bb7 into DmlPrototype 2 years ago

zhangxiang1993 deleted the user/xianz/QAttention_v2 branch 2 years ago

Reviewers

PatriceVignola

fdwr

Assignees

No one assigned

Labels

None yet

Milestone

No milestone