vllm
fused qknorm+rope kernel optimization for SM9.0
#37376
Merged

fused qknorm+rope kernel optimization for SM9.0 #37376

EricccYang
gemini-code-assist
gemini-code-assist commented on 2026-03-18
ProExpertProg
yewentao256
yewentao256 commented on 2026-03-18
EricccYang
EricccYang
ZJY0516
ZJY0516 commented on 2026-03-31
ProExpertProg
ProExpertProg approved these changes on 2026-03-19
EricccYang EricccYang force pushed 39 days ago
EricccYang EricccYang requested a review from russellb russellb 39 days ago
EricccYang EricccYang requested a review from noooop noooop 39 days ago
EricccYang EricccYang requested a review from NickLucche NickLucche 39 days ago
EricccYang EricccYang requested a review from tjtanaa tjtanaa 39 days ago
EricccYang EricccYang requested a review from gshtras gshtras 39 days ago
EricccYang EricccYang requested a review from vadiklyutiy vadiklyutiy 39 days ago
EricccYang EricccYang requested a review from tdoublep tdoublep 39 days ago
EricccYang EricccYang requested a review from patrickvonplaten patrickvonplaten 39 days ago
EricccYang EricccYang requested a review from luccafong luccafong 39 days ago
EricccYang EricccYang requested a review from sighingnow sighingnow 39 days ago
EricccYang EricccYang requested a review from tomeras91 tomeras91 39 days ago
EricccYang EricccYang requested a review from jikunshang jikunshang 39 days ago
EricccYang EricccYang requested a review from bigPYJ1151 bigPYJ1151 39 days ago
EricccYang EricccYang requested a review from hmellor hmellor 39 days ago
EricccYang EricccYang requested a review from markmc markmc 39 days ago
EricccYang EricccYang requested a review from ApostaC ApostaC 39 days ago
EricccYang EricccYang requested a review from orozery orozery 39 days ago
EricccYang EricccYang requested a review from jeejeelee jeejeelee 39 days ago
EricccYang EricccYang requested a review from mgoin mgoin 39 days ago
EricccYang EricccYang requested a review from youkaichao youkaichao 39 days ago
EricccYang EricccYang requested a review from WoosukKwon WoosukKwon 39 days ago
EricccYang EricccYang requested a review from robertgshaw2-redhat robertgshaw2-redhat 39 days ago
EricccYang EricccYang requested a review from njhill njhill 39 days ago
EricccYang EricccYang requested a review from ywang96 ywang96 39 days ago
EricccYang EricccYang requested a review from alexm-redhat alexm-redhat 39 days ago
EricccYang EricccYang requested a review from heheda12345 heheda12345 39 days ago
EricccYang EricccYang requested a review from aarnphm aarnphm 39 days ago
EricccYang EricccYang requested a review from DarkLight1337 DarkLight1337 39 days ago
EricccYang EricccYang requested a review from pavanimajety pavanimajety 39 days ago
EricccYang EricccYang requested a review from tlrmchlsmth tlrmchlsmth 39 days ago
EricccYang EricccYang requested a review from benchislett benchislett 39 days ago
EricccYang EricccYang requested a review from MatthewBonanni MatthewBonanni 39 days ago
EricccYang EricccYang requested a review from 22quinn 22quinn 39 days ago
EricccYang EricccYang requested a review from houseroad houseroad 39 days ago
EricccYang EricccYang requested a review from zhuohan123 zhuohan123 39 days ago
EricccYang EricccYang requested a review from LucasWilkinson LucasWilkinson 39 days ago
EricccYang EricccYang requested a review from chaunceyjiang chaunceyjiang 39 days ago
EricccYang EricccYang requested a review from zou3519 zou3519 39 days ago
EricccYang EricccYang requested a review from BoyuanFeng BoyuanFeng 39 days ago
mergify
mergify mergify added documentation
mergify mergify added ci/build
mergify mergify added deepseek
mergify mergify added frontend
mergify mergify added llama
mergify mergify added multi-modality
mergify mergify added new-model
mergify mergify added performance
mergify mergify added qwen
mergify mergify added gpt-oss
mergify mergify added nvidia
mergify mergify added rocm
mergify mergify added intel-gpu
mergify mergify added cpu
mergify mergify added structured-output
mergify mergify added speculative-decoding
mergify mergify added v1
mergify mergify added tpu
mergify mergify added tool-calling
mergify mergify assigned sangstar sangstar 39 days ago
mergify mergify added kv-connector
mergify
mergify mergify added needs-rebase
EricccYang EricccYang force pushed 39 days ago
mergify mergify removed tpu
mergify mergify removed needs-rebase
EricccYang
EricccYang EricccYang force pushed 39 days ago
EricccYang EricccYang force pushed 39 days ago
ZJY0516 ZJY0516 removed documentation
ZJY0516 ZJY0516 removed new-model
ZJY0516 ZJY0516 removed rocm
ZJY0516 ZJY0516 removed structured-output
ZJY0516 ZJY0516 removed frontend
ZJY0516 ZJY0516 removed intel-gpu
ZJY0516 ZJY0516 removed speculative-decoding
ZJY0516 ZJY0516 removed v1
ZJY0516 ZJY0516 removed multi-modality
ZJY0516 ZJY0516 removed tool-calling
ZJY0516 ZJY0516 removed llama
ZJY0516 ZJY0516 removed deepseek
ZJY0516 ZJY0516 removed cpu
ZJY0516 ZJY0516 removed gpt-oss
ZJY0516 ZJY0516 removed kv-connector
ZJY0516
ZJY0516 ZJY0516 added ready
yewentao256
yewentao256 commented on 2026-04-01
mergify
EricccYang EricccYang force pushed 35 days ago
mergify
mergify
mergify
mergify
mergify
EricccYang
EricccYang EricccYang force pushed 35 days ago
ProExpertProg
ProExpertProg approved these changes on 2026-04-07
ProExpertProg
EricccYang
EricccYang fused_qknorm_rope: add cp.async, multi-token-head kernel, and thresho…
aac46880
EricccYang refactor: align launch function structure with improve kernel style
9667299f
EricccYang add head_dim-aware threshold dispatch for token_heads_per_warp
25fc49a5
EricccYang add SM 9.0 guard for token_heads_per_warp dispatch and forced_token_h…
a9d0e46f
EricccYang clear comment
9ffc8f8f
EricccYang refactor(csrc): extract cp.async helpers to async_util.cuh
5a5a5ec6
EricccYang fix(csrc): make cuda_async helpers valid on host and non-sm80 paths
a7c3ba7f
EricccYang remove comment
f249b9d7
EricccYang fused_qknorm_rope: add cp.async, multi-token-head kernel, and thresho…
447e6e24
EricccYang refactor: align launch function structure with improve kernel style
3108ec3f
EricccYang fix(csrc): use getDeviceProperties(device_id) for correct multi-GPU d…
13433d05
EricccYang fix(csrc): add cp.async 16B alignment check for NTokenHeads kernel
7f4093fa
EricccYang fix(csrc): fallback to base kernel on cp.async alignment mismatch
6266b4a8
EricccYang fix: pass forced_token_heads_per_warp through fusion and defunctional…
bc01680a
EricccYang fix(csrc): remove duplicate launch function definitions from rebase
133dccc1
EricccYang test: temporarily force token_heads_per_warp=1 for nsys verification
4d62fed5
EricccYang revert: restore forced_token_heads_per_warp=-1 (auto) after verification
b4e70f8a
EricccYang style: apply clang-format to C++ sources
7115185f
EricccYang ci: trigger re-run
83dcdcb3
EricccYang EricccYang force pushed to 83dcdcb3 31 days ago
ProExpertProg
EricccYang
mergify
mergify mergify added needs-rebase
ProExpertProg
EricccYang Merge branch 'main' into apply-async-multitoken-kernel
4162cef1
mergify mergify removed needs-rebase
EricccYang ci: trigger re-run
0a9c1ac6
EricccYang EricccYang force pushed to 0a9c1ac6 29 days ago
EricccYang
EricccYang Merge branch 'main' into apply-async-multitoken-kernel
407de864
EricccYang Merge branch 'main' into apply-async-multitoken-kernel
3bf5f91d
EricccYang Merge branch 'main' into apply-async-multitoken-kernel
ce967117
AndreasKaratzas
vllm-bot vllm-bot merged 4beeb068 into main 27 days ago

Login to write a write a comment.

Login via GitHub