PR #16130 CUDA: add a fused top-K MoE kernel

CUDA: add a fused top-K MoE kernel #16130

JohannesGaessler merged 9 commits into ggml-org:master from am17an:cuda_topk_moe

am17an requested a review from

ggerganov 50 days ago

am17an requested a review from

JohannesGaessler 50 days ago

github-actions added testing

github-actions added Nvidia GPU

github-actions added ggml

am17an force pushed from 4d6c41a6 to 7345668b 50 days ago

am17an force pushed from 324ecbb1 to 613b6c39 50 days ago

JohannesGaessler commented on 2025-09-22

am17an force pushed from 613b6c39 to 17c9e7c9 48 days ago

am17an requested a review from

CISC 48 days ago

am17an requested a review from

JohannesGaessler 48 days ago

JohannesGaessler approved these changes on 2025-09-22

ggerganov commented on 2025-09-22

am17an force pushed from 17c9e7c9 to 7a258bfa 48 days ago

am17an requested a review from

slaren 48 days ago

am17an force pushed from 7a258bfa to a275f10b 46 days ago

ggerganov commented on 2025-09-24

am17an force pushed from a275f10b to bb0e5d07 46 days ago

am17an requested a review from

JohannesGaessler 46 days ago

JohannesGaessler commented on 2025-09-24

am17an force pushed from bb0e5d07 to 2ea8133b 46 days ago

CUDA: add a fused top-K MoE kernel

a208f5c9

Refactor into ggml_cuda_should_use_topk_moe

9fc0396e

Review: Use better coalescing pattern, use WARP_SIZE, store logits in…

8b780cce

Review: format + micro-optimizations

ce867aa7

Fix bug: fix tie breakers

2930668c

Add optional norm + clean-up code

240b2c1f

am17an force pushed from 2ea8133b to 4b2d2b9f 46 days ago

am17an force pushed from 4b2d2b9f to 639e9543 46 days ago

Use smem for final write

e772b28f

am17an force pushed from 639e9543 to e772b28f 46 days ago

Add bounds check

53acfe61

am17an force pushed from 941bc9e6 to 53acfe61 45 days ago

JohannesGaessler commented on 2025-09-25

Use better memory pattern for writeback

33856e1c

JohannesGaessler approved these changes on 2025-09-25

JohannesGaessler merged 077c94d0 into master 45 days ago

am17an deleted the cuda_topk_moe branch 44 days ago

jeffbolznv commented on 2025-10-17

Reviewers

JohannesGaessler

jeffbolznv

ggerganov

CISC

slaren

Assignees

No one assigned

Labels

testing Nvidia GPU ggml

Milestone

No milestone

llama.cpp CUDA: add a fused top-K MoE kernel #16130 Merged

CUDA: add a fused top-K MoE kernel #16130

llama.cpp
CUDA: add a fused top-K MoE kernel
#16130

Merged