vllm
Move query quantization to attention layer for Flashinfer & Triton.
#26534
Merged

Move query quantization to attention layer for Flashinfer & Triton. #26534

adabeyta
adabeyta adabeyta requested a review from tdoublep tdoublep 230 days ago
adabeyta adabeyta requested a review from mgoin mgoin 230 days ago
mergify
mergify mergify added v1
mergify mergify added needs-rebase
gemini-code-assist
gemini-code-assist commented on 2025-10-09
chatgpt-codex-connector
chatgpt-codex-connector commented on 2025-10-09
adabeyta adabeyta force pushed 230 days ago
mergify mergify removed needs-rebase
adabeyta Move query quant to attn layer for flashinfer & triton.
bea52f69
adabeyta adabeyta force pushed to bea52f69 230 days ago
mergify
mergify mergify added needs-rebase
ProExpertProg ProExpertProg added this to the vllm==v0.12.0/torch==2.9.0 compilation improvements milestone 230 days ago
adabeyta Gate query quantization on q_data_type and make supports_quant_query_…
d20e0231
adabeyta adabeyta requested a review from LucasWilkinson LucasWilkinson 229 days ago
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 229 days ago
elvischenv
elvischenv commented on 2025-10-11
adabeyta Merge branch 'main' into q_quant_attn_mv
f79d8654
mergify mergify removed needs-rebase
ProExpertProg
ProExpertProg commented on 2025-10-13
ProExpertProg
ProExpertProg commented on 2025-10-13
ProExpertProg
ProExpertProg commented on 2025-10-13
ProExpertProg
ProExpertProg commented on 2025-10-13
adabeyta adabeyta requested a review from WoosukKwon WoosukKwon 226 days ago
adabeyta adabeyta requested a review from zhuohan123 zhuohan123 226 days ago
adabeyta adabeyta requested a review from youkaichao youkaichao 226 days ago
adabeyta adabeyta requested a review from alexm-redhat alexm-redhat 226 days ago
adabeyta adabeyta requested a review from comaniac comaniac 226 days ago
adabeyta adabeyta requested a review from njhill njhill 226 days ago
adabeyta Move can_use_trtllm to implementation object instead of backend property
f6f933d7
adabeyta adabeyta force pushed to f6f933d7 226 days ago
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 226 days ago
ProExpertProg
ProExpertProg commented on 2025-10-13
pavanimajety
pavanimajety commented on 2025-10-13
adabeyta Merge remote-tracking branch 'origin/main' into q_quant_attn_mv
46eb6ff5
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 226 days ago
adabeyta adabeyta requested a review from pavanimajety pavanimajety 226 days ago
ProExpertProg
ProExpertProg commented on 2025-10-13
adabeyta Merge branch 'main' into q_quant_attn_mv
9637d7f0
adabeyta adabeyta force pushed 226 days ago
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 226 days ago
ProExpertProg
ProExpertProg commented on 2025-10-14
adabeyta Add attn_metadata.q_data_type matches query.dtype() assert
848158a3
adabeyta adabeyta force pushed to 848158a3 226 days ago
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 226 days ago
adabeyta Merge branch 'main' into q_quant_attn_mv
39da3947
adabeyta Merge branch 'main' into q_quant_attn_mv
734f6ff7
ProExpertProg
ProExpertProg commented on 2025-10-14
adabeyta Remove supports_quant_query_input from backend in place of impl methods
af2359c7
adabeyta adabeyta requested a review from ProExpertProg ProExpertProg 225 days ago
adabeyta Merge branch 'main' into q_quant_attn_mv
b0478dce
ProExpertProg
ProExpertProg commented on 2025-10-14
ProExpertProg ProExpertProg added ready
adabeyta Add todo for adding support to more backends.
c28f36bb
ProExpertProg
ProExpertProg approved these changes on 2025-10-14
adabeyta Merge branch 'main' into q_quant_attn_mv
633b1594
adabeyta Merge branch 'main' into q_quant_attn_mv
543a8fe2
pavanimajety
ProExpertProg
adabeyta Update fusion attn UT to properly address query-quant
a1f41176
adabeyta Merge branch 'main' into q_quant_attn_mv
c3a92007
ProExpertProg
ProExpertProg approved these changes on 2025-10-15
ProExpertProg ProExpertProg enabled auto-merge (squash) 224 days ago
disabled auto-merge 224 days ago
Manually disabled by user
ProExpertProg
ProExpertProg requested changes on 2025-10-15
ProExpertProg
ProExpertProg approved these changes on 2025-10-15
ProExpertProg ProExpertProg merged 0a9ef0cf into main 224 days ago
adabeyta
pavanimajety

Login to write a write a comment.

Login via GitHub