vllm
[Attention] Add FlashInfer Sparse MLA backend
#33451
Merged

[Attention] Add FlashInfer Sparse MLA backend #33451

MatthewBonanni
MatthewBonanni Initial implementation
fbd9b57c
MatthewBonanni Add implementation
dee20921
MatthewBonanni Update test
aaca43b5
MatthewBonanni Remove unnecessary skip
6f3bbc9a
mergify
mergify mergify added documentation
mergify mergify added nvidia
mergify mergify added v1
gemini-code-assist
gemini-code-assist commented on 2026-01-30
MatthewBonanni Cleanup
77d5461b
MatthewBonanni Merge branch 'main' into fi_sparse
a6072d8b
MatthewBonanni Address refactor
38e6fc99
MatthewBonanni Move kernel to sparse_utils
ead72ab6
MatthewBonanni Fix from refactor
49cd2094
MatthewBonanni Super init call not necessary
14044783
MatthewBonanni More fixes
349da878
MatthewBonanni Fix uniform query lengths
d4536e27
MatthewBonanni Update check_and_update_config
98530a27
MatthewBonanni Update block size support
bb4e94e5
MatthewBonanni Parameterize block sizes
ad8d06ed
MatthewBonanni Update benchmark
f604adb0
mergify mergify added performance
MatthewBonanni Clean up
f281b288
MatthewBonanni Fix
9a6fb87c
MatthewBonanni Use single batch layout
5dcf0587
MatthewBonanni Remove unnecessary abstraction
9bdfa661
MatthewBonanni Fix indexing
9043242e
MatthewBonanni Improve test
1fb6fb01
mergify
mergify mergify added needs-rebase
LucasWilkinson fix up attention benchmarks
c35a9c6c
MatthewBonanni clean
24a54002
MatthewBonanni add smoke
e8e0f4e8
MatthewBonanni Fix and update test
1daa3d0a
MatthewBonanni MatthewBonanni marked this pull request as ready for review 106 days ago
MatthewBonanni MatthewBonanni requested a review from pavanimajety pavanimajety 106 days ago
MatthewBonanni MatthewBonanni requested a review from LucasWilkinson LucasWilkinson 106 days ago
MatthewBonanni Merge branch 'main' into fi_sparse
23780f19
mergify mergify removed needs-rebase
MatthewBonanni Merge branch 'lwilkinson/fix-up-attention-benchmarks' into fi_sparse
f245674a
mergify mergify added ci/build
MatthewBonanni
MatthewBonanni Fix benchmarks
f326062c
MatthewBonanni More benchmark ux improvements
553b1401
MatthewBonanni Update mla_decode
1e29943f
MatthewBonanni Sort benchmark output
c19e1131
MatthewBonanni Add mla prefill case
5cb64d62
MatthewBonanni Prefer FlashInfer at low head counts
9a31d73f
MatthewBonanni MatthewBonanni requested a review from hmellor hmellor 101 days ago
mergify
mergify mergify added needs-rebase
MatthewBonanni Merge branch 'main' into fi_sparse
ab3fb88b
mergify
mergify
mergify
mergify
mergify
mergify
mergify
mergify
mergify
mergify mergify removed needs-rebase
MatthewBonanni Update other platforms
6d7b2c32
MatthewBonanni MatthewBonanni requested a review from tjtanaa tjtanaa 101 days ago
MatthewBonanni MatthewBonanni requested a review from jikunshang jikunshang 101 days ago
MatthewBonanni MatthewBonanni requested a review from bigPYJ1151 bigPYJ1151 101 days ago
mergify mergify added rocm
mergify mergify added cpu
mergify
MatthewBonanni Merge branch 'main' into fi_sparse
fa0655d3
LucasWilkinson LucasWilkinson changed the title [WIP][Attention] Add FlashInfer Sparse MLA backend [Attention] Add FlashInfer Sparse MLA backend 101 days ago
mergify
mergify mergify added needs-rebase
MatthewBonanni Merge branch 'main' into fi_sparse
db66e7bc
mergify mergify removed needs-rebase
LucasWilkinson Merge branch 'main' into fi_sparse
c69830b2
LucasWilkinson
LucasWilkinson approved these changes on 2026-02-11
LucasWilkinson LucasWilkinson enabled auto-merge (squash) 99 days ago
github-actions github-actions added ready
LucasWilkinson LucasWilkinson merged f2c47886 into main 98 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone