[Kernel] Triton-based Top-k and Top-p sampler kernels #33538
Attempt 1
a5fc250e
Top k works?
c95041b3
Top k works?
fe60b223
Tenary search
74a18b5a
Quadruple Search
7502c064
Quadruple Search
360e2343
Added outliers
11bd61fc
Added gather
a922b45e
Added gather
6f39f209
0.00115 for topk
30033c22
0.00115 for topk
29876175
topk working, adding topp:
ba5b98b5
Wrong results
46bcc7df
Wrong results
5de5ece6
Fixed?
cbcf7f52
Fixed?
643c21d4
Maybe?
2737c2d7
Duplicate logit issues.
f24d2e17
Duplicate logit issues.
a58ca6cf
Top-p duplicate handler implemented
b87c0954
Top-p fixed
6e3ca0a3
Need to implement topp-only, topk and topk-topp works.
034e8024
Correctness tested for top-p. Duplication handling for top-k remaining.
11145820
Deeseep tests
56a615f0
Added env var VLLM_USE_TRITON_SAMPLER and automated test
6bea89cd
Merge remote-tracking branch 'origin/main' into topk_topp
8309b68e
Linter
5575c676
Tests
3342235a
Added Triton autotune
9bb0fbbc
Reduce diff and do fallback when batch size small.
340b6b46
Merge remote-tracking branch 'origin/main' into topk_topp
54df27fa
Test script fix
cf768c22
Added graph generation
9b3cf75a
Removed fallback
4235295a
Merge branch 'vllm-project:main' into topk_topp
1e3ed757
Added Gemini's suggestions, removed triton autotune.
344c3e4a
Merge branch 'topk_topp' of https://github.com/cakeng/vllm into topk_…
ba89c384
Fixed warps and stages
da1b1e6f
Fixed scratchpads
289c2ba8
Fixed scratchpads
5b0b1e6e
Merge branch 'main' into topk_topp
865b5230
Merge branch 'main' into topk_topp
350cbc8a
Init Sunga's correct triton top_k top_p implementation
5e6156cf
initial commit
7401ead1
init commit
d8fac6a2
not working.........
1d349d31
working on it....
b9a0c053
working........python compare.py
71c59786
...
115a98b3
...
f9b08f22
slow but working
b8728dbc
very slow
953025e0
pushed?
d1ca674f
Top-k working
5697d83e
Errors on top p
a2f6ae61
Everything correct but slow
2893ed54
Everything correct but slow
6e3c8744
Fast and working correctly
d0f491ee
Fast and working correctly
6743e12d
Errors
60b6515a
Filtered logits are wrongs
71cbb9ef
Filtered logits are wrongs
8b0771ce
Floating point associativity errors remain
20806a27
Merge main
f8cc4535
Remove tester
89443c01
Bugfix
204c221f
Test file removed.
e262fcb5
Typos
5e6dc79f
Typos
091b5188
Typos
5fc986e2
Typos
02d446bb
Typos
d0f02f6c
Bugfixes
152bc320
Deduplication
db9859f5
Duplication search bugfix
b936c94d
Bugfixes
3784e603
PyTorch sort permutes the order of duplicate values when sorting. Whe…
b0b6253c
Original pytorch implemntation apply softmax after sorting, which pro…
cd98ab90
Helper scripts
b72e2076
Helper scripts removed
b1152c15
Change hyperparameters
d2d56a12
Merge main
6421e1e9
[Perf] Triton-based top-p/top-k masking
7643eabd
fix doc
5a241a69
fix method name, only use triton when supported
b017713d
Merge remote-tracking branch 'refs/remotes/origin/main' into triton-t…
bd5d2413
fix precision
e067cbfd
Merge remote-tracking branch 'refs/remotes/origin/main' into triton-t…
a02aee88
Merge commit 'refs/pull/32558/head' of https://github.com/vllm-projec…
fbeb15f7
Copied topk + topp impl
463afa65
Copied topk + topp impl
9a5f30d7
Topp wrong
65874cce
Topp working, topp only
a671a098
Both Topk Topp working
cf6ab55a
Restored tests
150ccc6d
Bugfix
ae08705a
Loosened hyperparameters
49c3c39b
Linter
06565dfd
Restore
acd99d71
Merge branch 'main' into triton-topk-topp
ca3fff65
Merge branch 'topk_topp' into triton-topk-topp
b9d2275f
Update vllm/v1/sample/ops/topk_topp_triton.py
37f322a6
Bugfix
cb731c57
Refactor comments for clarity in topk_topp_triton.py
0c61b95c
Pre-commit fix
503f0b0f
Update arxiv
576f90eb
adjust prob distribution in benchmark, adjust threshold
c18fe71e
Merge remote-tracking branch 'refs/remotes/origin/main' into triton-t…
dba83d52
some simplification/cleanup
c246c3a7
fix precommit
4360e923
njhill
approved these changes
on 2026-02-13
Merge branch 'main' into triton-topk-topp
612f38ff
njhill
enabled auto-merge (squash) 51 days ago
Merge remote-tracking branch 'refs/remotes/origin/main' into triton-t…
47ef82d7
fix -inf edge cases and possible infinite loop
b917a495
Merge remote-tracking branch 'origin/main' into triton-topk-topp
ef5d06e2
add async yield in cancellation test
9dadec16
Merge remote-tracking branch 'refs/remotes/origin/main' into triton-t…
8e027560
use temperature=0 in cancellation test
0e46d904
Merge remote-tracking branch 'origin/main' into triton-topk-topp
f7873c5f
njhill
merged
c656ba3b
into main 46 days ago
Assignees
No one assigned
Labels
performance
ready
v1
Login to write a write a comment.
Login via GitHub