llama.cpp
CUDA: add a fused top-K MoE kernel
#16130
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
9
Changes
View On
GitHub
Commits
CUDA: add a fused top-K MoE kernel
am17an
committed
46 days ago
Refactor into ggml_cuda_should_use_topk_moe
am17an
committed
46 days ago
Review: Use better coalescing pattern, use WARP_SIZE, store logits into registers before
am17an
committed
46 days ago
Review: format + micro-optimizations
am17an
committed
46 days ago
Fix bug: fix tie breakers
am17an
committed
46 days ago
Add optional norm + clean-up code
am17an
committed
46 days ago
Use smem for final write
am17an
committed
46 days ago
Add bounds check
am17an
committed
45 days ago
Use better memory pattern for writeback
am17an
committed
45 days ago
Loading