vllm
0a9ef0cf - Move query quantization to attention layer for Flashinfer & Triton. (#26534)

Commit
112 days ago
Move query quantization to attention layer for Flashinfer & Triton. (#26534) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Author
Parents
Loading