vllm
0a9ef0cf - Move query quantization to attention layer for Flashinfer & Triton. (#26534)

Commit

112 days ago

Move query quantization to attention layer for Flashinfer & Triton. (#26534) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

References

#26534 - Move query quantization to attention layer for Flashinfer & Triton.

Author

adabeyta

Parents

e5b438a2

vllm 0a9ef0cf - Move query quantization to attention layer for Flashinfer & Triton. (#26534)

vllm
0a9ef0cf - Move query quantization to attention layer for Flashinfer & Triton. (#26534)