llama.cpp
CUDA: add a fused top-K MoE kernel
#16130
Merged

CUDA: add a fused top-K MoE kernel #16130

am17an
am17an am17an requested a review from ggerganov ggerganov 20 days ago
am17an am17an requested a review from JohannesGaessler JohannesGaessler 20 days ago
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
am17an am17an force pushed from 4d6c41a6 to 7345668b 20 days ago
am17an
am17an am17an force pushed from 324ecbb1 to 613b6c39 19 days ago
JohannesGaessler
JohannesGaessler commented on 2025-09-22
am17an am17an force pushed from 613b6c39 to 17c9e7c9 18 days ago
am17an am17an requested a review from CISC CISC 18 days ago
am17an am17an requested a review from JohannesGaessler JohannesGaessler 18 days ago
am17an
JohannesGaessler
JohannesGaessler approved these changes on 2025-09-22
ggerganov
ggerganov commented on 2025-09-22
am17an am17an force pushed from 17c9e7c9 to 7a258bfa 17 days ago
am17an am17an requested a review from slaren slaren 17 days ago
am17an
JohannesGaessler
am17an
am17an
slaren
am17an
am17an
ggerganov
am17an
ggerganov
am17an
am17an am17an force pushed from 7a258bfa to a275f10b 16 days ago
am17an
ggerganov
ggerganov commented on 2025-09-24
am17an
slaren
ggerganov
slaren
am17an
jeffbolznv
am17an am17an force pushed from a275f10b to bb0e5d07 16 days ago
am17an am17an requested a review from JohannesGaessler JohannesGaessler 16 days ago
am17an
JohannesGaessler
JohannesGaessler commented on 2025-09-24
am17an am17an force pushed from bb0e5d07 to 2ea8133b 15 days ago
am17an CUDA: add a fused top-K MoE kernel
a208f5c9
am17an Refactor into ggml_cuda_should_use_topk_moe
9fc0396e
am17an Review: Use better coalescing pattern, use WARP_SIZE, store logits in…
8b780cce
am17an Review: format + micro-optimizations
ce867aa7
am17an Fix bug: fix tie breakers
2930668c
am17an Add optional norm + clean-up code
240b2c1f
am17an am17an force pushed from 2ea8133b to 4b2d2b9f 15 days ago
am17an am17an force pushed from 4b2d2b9f to 639e9543 15 days ago
am17an Use smem for final write
e772b28f
am17an am17an force pushed from 639e9543 to e772b28f 15 days ago
am17an Add bounds check
53acfe61
am17an am17an force pushed from 941bc9e6 to 53acfe61 15 days ago
JohannesGaessler
JohannesGaessler commented on 2025-09-25
am17an Use better memory pattern for writeback
33856e1c
JohannesGaessler
JohannesGaessler approved these changes on 2025-09-25
am17an
JohannesGaessler
am17an
am17an
JohannesGaessler JohannesGaessler merged 077c94d0 into master 15 days ago
am17an am17an deleted the cuda_topk_moe branch 14 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone