vllm
0a9ef0cf
- Move query quantization to attention layer for Flashinfer & Triton. (#26534)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
112 days ago
Move query quantization to attention layer for Flashinfer & Triton. (#26534) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
References
#26534 - Move query quantization to attention layer for Flashinfer & Triton.
Author
adabeyta
Parents
e5b438a2
Loading