onnxruntime
[CUDA] GQA CUDA Kernel Fusion and Performance Optimization
#26920
Merged

[CUDA] GQA CUDA Kernel Fusion and Performance Optimization #26920

tianleiwu merged 10 commits into main from tlwu/cuda_gqa_fused_kernel
tianleiwu
tianleiwu GQA cuda fused kernel for kv cache and rotary
a412553a
tianleiwu use fused kernel for packed qkv, rotary and first prompt
6a3742d4
tianleiwu tianleiwu marked this pull request as draft 133 days ago
tianleiwu flash attention fast decode
5ee35da2
tianleiwu tianleiwu marked this pull request as ready for review 133 days ago
tianleiwu tianleiwu requested a review from kunal-vaishnavi kunal-vaishnavi 133 days ago
tianleiwu tianleiwu requested a review from nenad1002 nenad1002 133 days ago
tianleiwu tianleiwu requested a review from apsonawane apsonawane 133 days ago
tianleiwu tianleiwu requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 133 days ago
apsonawane
apsonawane commented on 2026-01-07
tianleiwu update #include
9f6f0734
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-01-07
tianleiwu review feedback
8d20af56
apsonawane
apsonawane dismissed these changes on 2026-01-07
kunal-vaishnavi
kunal-vaishnavi dismissed these changes on 2026-01-07
tianleiwu Merge branch 'main' into tlwu/cuda_gqa_fused_kernel
eb5b1838
tianleiwu tianleiwu marked this pull request as draft 131 days ago
tianleiwu tianleiwu changed the title [CUDA] GQA Fused Kernel for QKV Unpack, RoPE, and KV Cache Append [CUDA] GQA CUDA Kernel Fusion and Performance Optimization 131 days ago
tianleiwu Improve kernel, document and tests
e768ee05
tianleiwu tianleiwu dismissed their stale review via e768ee05 131 days ago
tianleiwu tianleiwu dismissed their stale review via e768ee05 131 days ago
tianleiwu avoid overflow
699e395d
tianleiwu clean up and assert alignment
69087ca8
tianleiwu optimize buffer size
684c7cb5
tianleiwu tianleiwu marked this pull request as ready for review 131 days ago
tianleiwu tianleiwu requested a review from kunal-vaishnavi kunal-vaishnavi 131 days ago
tianleiwu tianleiwu requested a review from apsonawane apsonawane 131 days ago
tianleiwu tianleiwu requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 131 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-01-09
kunal-vaishnavi
kunal-vaishnavi commented on 2026-01-09
kunal-vaishnavi
kunal-vaishnavi commented on 2026-01-09
kunal-vaishnavi
tianleiwu
kunal-vaishnavi
kunal-vaishnavi approved these changes on 2026-01-09
tianleiwu tianleiwu merged 39d8520b into main 130 days ago
tianleiwu tianleiwu deleted the tlwu/cuda_gqa_fused_kernel branch 130 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone