vllm
Move query quantization to attention layer for Flashinfer & Triton.
#26534
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
16
Changes
View On
GitHub
Move query quantization to attention layer for Flashinfer & Triton.
#26534
ProExpertProg
merged 16 commits into
vllm-project:main
from
adabeyta:q_quant_attn_mv
adabeyta
requested a review
from
tdoublep
230 days ago
adabeyta
requested a review
from
mgoin
230 days ago
mergify
added
v1
mergify
added
needs-rebase
gemini-code-assist
commented on 2025-10-09
chatgpt-codex-connector
commented on 2025-10-09
adabeyta
force pushed
230 days ago
mergify
removed
needs-rebase
Move query quant to attn layer for flashinfer & triton.
bea52f69
adabeyta
force pushed
to
bea52f69
230 days ago
mergify
added
needs-rebase
ProExpertProg
added this to the
vllm==v0.12.0/torch==2.9.0 compilation improvements
milestone
230 days ago
Gate query quantization on q_data_type and make supports_quant_query_…
d20e0231
adabeyta
requested a review
from
LucasWilkinson
229 days ago
adabeyta
requested a review
from
ProExpertProg
229 days ago
elvischenv
commented on 2025-10-11
Merge branch 'main' into q_quant_attn_mv
f79d8654
mergify
removed
needs-rebase
ProExpertProg
commented on 2025-10-13
ProExpertProg
commented on 2025-10-13
ProExpertProg
commented on 2025-10-13
ProExpertProg
commented on 2025-10-13
adabeyta
requested a review
from
WoosukKwon
226 days ago
adabeyta
requested a review
from
zhuohan123
226 days ago
adabeyta
requested a review
from
youkaichao
226 days ago
adabeyta
requested a review
from
alexm-redhat
226 days ago
adabeyta
requested a review
from
comaniac
226 days ago
adabeyta
requested a review
from
njhill
226 days ago
Move can_use_trtllm to implementation object instead of backend property
f6f933d7
adabeyta
force pushed
to
f6f933d7
226 days ago
adabeyta
requested a review
from
ProExpertProg
226 days ago
ProExpertProg
commented on 2025-10-13
pavanimajety
commented on 2025-10-13
Merge remote-tracking branch 'origin/main' into q_quant_attn_mv
46eb6ff5
adabeyta
requested a review
from
ProExpertProg
226 days ago
adabeyta
requested a review
from
pavanimajety
226 days ago
ProExpertProg
commented on 2025-10-13
Merge branch 'main' into q_quant_attn_mv
9637d7f0
adabeyta
force pushed
226 days ago
adabeyta
requested a review
from
ProExpertProg
226 days ago
ProExpertProg
commented on 2025-10-14
Add attn_metadata.q_data_type matches query.dtype() assert
848158a3
adabeyta
force pushed
to
848158a3
226 days ago
adabeyta
requested a review
from
ProExpertProg
226 days ago
Merge branch 'main' into q_quant_attn_mv
39da3947
Merge branch 'main' into q_quant_attn_mv
734f6ff7
ProExpertProg
commented on 2025-10-14
Remove supports_quant_query_input from backend in place of impl methods
af2359c7
adabeyta
requested a review
from
ProExpertProg
225 days ago
Merge branch 'main' into q_quant_attn_mv
b0478dce
ProExpertProg
commented on 2025-10-14
ProExpertProg
added
ready
Add todo for adding support to more backends.
c28f36bb
ProExpertProg
approved these changes on 2025-10-14
Merge branch 'main' into q_quant_attn_mv
633b1594
Merge branch 'main' into q_quant_attn_mv
543a8fe2
Update fusion attn UT to properly address query-quant
a1f41176
Merge branch 'main' into q_quant_attn_mv
c3a92007
ProExpertProg
approved these changes on 2025-10-15
ProExpertProg
enabled auto-merge (squash)
224 days ago
disabled auto-merge
224 days ago
Manually disabled by user
ProExpertProg
requested changes on 2025-10-15
ProExpertProg
approved these changes on 2025-10-15
ProExpertProg
merged
0a9ef0cf
into main
224 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ProExpertProg
pavanimajety
chatgpt-codex-connector
elvischenv
gemini-code-assist
tdoublep
mgoin
LucasWilkinson
WoosukKwon
zhuohan123
youkaichao
alexm-redhat
comaniac
njhill
Assignees
No one assigned
Labels
ready
v1
Milestone
torch==2.9.0 compilation improvements
Login to write a write a comment.
Login via GitHub