vllm
fused qknorm+rope kernel optimization for SM9.0
#37376
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
24
Changes
View On
GitHub
fused qknorm+rope kernel optimization for SM9.0
#37376
vllm-bot
merged 24 commits into
vllm-project:main
from
EricccYang:apply-async-multitoken-kernel
gemini-code-assist
commented on 2026-03-18
yewentao256
commented on 2026-03-18
ZJY0516
commented on 2026-03-31
ProExpertProg
approved these changes on 2026-03-19
EricccYang
force pushed
39 days ago
EricccYang
requested a review
from
russellb
39 days ago
EricccYang
requested a review
from
noooop
39 days ago
EricccYang
requested a review
from
NickLucche
39 days ago
EricccYang
requested a review
from
tjtanaa
39 days ago
EricccYang
requested a review
from
gshtras
39 days ago
EricccYang
requested a review
from
vadiklyutiy
39 days ago
EricccYang
requested a review
from
tdoublep
39 days ago
EricccYang
requested a review
from
patrickvonplaten
39 days ago
EricccYang
requested a review
from
luccafong
39 days ago
EricccYang
requested a review
from
sighingnow
39 days ago
EricccYang
requested a review
from
tomeras91
39 days ago
EricccYang
requested a review
from
jikunshang
39 days ago
EricccYang
requested a review
from
bigPYJ1151
39 days ago
EricccYang
requested a review
from
hmellor
39 days ago
EricccYang
requested a review
from
markmc
39 days ago
EricccYang
requested a review
from
ApostaC
39 days ago
EricccYang
requested a review
from
orozery
39 days ago
EricccYang
requested a review
from
jeejeelee
39 days ago
EricccYang
requested a review
from
mgoin
39 days ago
EricccYang
requested a review
from
youkaichao
39 days ago
EricccYang
requested a review
from
WoosukKwon
39 days ago
EricccYang
requested a review
from
robertgshaw2-redhat
39 days ago
EricccYang
requested a review
from
njhill
39 days ago
EricccYang
requested a review
from
ywang96
39 days ago
EricccYang
requested a review
from
alexm-redhat
39 days ago
EricccYang
requested a review
from
heheda12345
39 days ago
EricccYang
requested a review
from
aarnphm
39 days ago
EricccYang
requested a review
from
DarkLight1337
39 days ago
EricccYang
requested a review
from
pavanimajety
39 days ago
EricccYang
requested a review
from
tlrmchlsmth
39 days ago
EricccYang
requested a review
from
benchislett
39 days ago
EricccYang
requested a review
from
MatthewBonanni
39 days ago
EricccYang
requested a review
from
22quinn
39 days ago
EricccYang
requested a review
from
houseroad
39 days ago
EricccYang
requested a review
from
zhuohan123
39 days ago
EricccYang
requested a review
from
LucasWilkinson
39 days ago
EricccYang
requested a review
from
chaunceyjiang
39 days ago
EricccYang
requested a review
from
zou3519
39 days ago
EricccYang
requested a review
from
BoyuanFeng
39 days ago
mergify
added
documentation
mergify
added
ci/build
mergify
added
deepseek
mergify
added
frontend
mergify
added
llama
mergify
added
multi-modality
mergify
added
new-model
mergify
added
performance
mergify
added
qwen
mergify
added
gpt-oss
mergify
added
nvidia
mergify
added
rocm
mergify
added
intel-gpu
mergify
added
cpu
mergify
added
structured-output
mergify
added
speculative-decoding
mergify
added
v1
mergify
added
tpu
mergify
added
tool-calling
mergify
assigned
sangstar
39 days ago
mergify
added
kv-connector
mergify
added
needs-rebase
EricccYang
force pushed
39 days ago
mergify
removed
tpu
mergify
removed
needs-rebase
EricccYang
force pushed
39 days ago
EricccYang
force pushed
39 days ago
ZJY0516
removed
documentation
ZJY0516
removed
new-model
ZJY0516
removed
rocm
ZJY0516
removed
structured-output
ZJY0516
removed
frontend
ZJY0516
removed
intel-gpu
ZJY0516
removed
speculative-decoding
ZJY0516
removed
v1
ZJY0516
removed
multi-modality
ZJY0516
removed
tool-calling
ZJY0516
removed
llama
ZJY0516
removed
deepseek
ZJY0516
removed
cpu
ZJY0516
removed
gpt-oss
ZJY0516
removed
kv-connector
ZJY0516
added
ready
yewentao256
commented on 2026-04-01
EricccYang
force pushed
35 days ago
EricccYang
force pushed
35 days ago
ProExpertProg
approved these changes on 2026-04-07
fused_qknorm_rope: add cp.async, multi-token-head kernel, and thresho…
aac46880
refactor: align launch function structure with improve kernel style
9667299f
add head_dim-aware threshold dispatch for token_heads_per_warp
25fc49a5
add SM 9.0 guard for token_heads_per_warp dispatch and forced_token_h…
a9d0e46f
clear comment
9ffc8f8f
refactor(csrc): extract cp.async helpers to async_util.cuh
5a5a5ec6
fix(csrc): make cuda_async helpers valid on host and non-sm80 paths
a7c3ba7f
remove comment
f249b9d7
fused_qknorm_rope: add cp.async, multi-token-head kernel, and thresho…
447e6e24
refactor: align launch function structure with improve kernel style
3108ec3f
fix(csrc): use getDeviceProperties(device_id) for correct multi-GPU d…
13433d05
fix(csrc): add cp.async 16B alignment check for NTokenHeads kernel
7f4093fa
fix(csrc): fallback to base kernel on cp.async alignment mismatch
6266b4a8
fix: pass forced_token_heads_per_warp through fusion and defunctional…
bc01680a
fix(csrc): remove duplicate launch function definitions from rebase
133dccc1
test: temporarily force token_heads_per_warp=1 for nsys verification
4d62fed5
revert: restore forced_token_heads_per_warp=-1 (auto) after verification
b4e70f8a
style: apply clang-format to C++ sources
7115185f
ci: trigger re-run
83dcdcb3
EricccYang
force pushed
to
83dcdcb3
31 days ago
mergify
added
needs-rebase
Merge branch 'main' into apply-async-multitoken-kernel
4162cef1
mergify
removed
needs-rebase
ci: trigger re-run
0a9c1ac6
EricccYang
force pushed
to
0a9c1ac6
29 days ago
Merge branch 'main' into apply-async-multitoken-kernel
407de864
Merge branch 'main' into apply-async-multitoken-kernel
3bf5f91d
Merge branch 'main' into apply-async-multitoken-kernel
ce967117
vllm-bot
merged
4beeb068
into main
27 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ProExpertProg
yewentao256
ZJY0516
gemini-code-assist
russellb
noooop
NickLucche
tjtanaa
gshtras
vadiklyutiy
tdoublep
patrickvonplaten
luccafong
sighingnow
tomeras91
jikunshang
bigPYJ1151
hmellor
markmc
ApostaC
orozery
jeejeelee
mgoin
youkaichao
WoosukKwon
robertgshaw2-redhat
njhill
ywang96
alexm-redhat
heheda12345
aarnphm
DarkLight1337
pavanimajety
tlrmchlsmth
benchislett
MatthewBonanni
22quinn
houseroad
zhuohan123
LucasWilkinson
chaunceyjiang
zou3519
BoyuanFeng
Assignees
sangstar
Labels
performance
ready
ci/build
qwen
nvidia
Milestone
No milestone
Login to write a write a comment.
Login via GitHub