onnxruntime
4771256b
- fix to avoid quantizing attention with varied q,k,v sizes (#9357)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
4 years ago
fix to avoid quantizing attention with varied q,k,v sizes (#9357) * fix to avoid quantizing attention with varied q,k,v sizes * updated the changes to address the comments
References
#9357 - fix to avoid quantizing attention with varied q,k,v sizes
Author
viboga
Parents
ba0cca96
Loading