llama.cpp
CUDA: add a fused top-K MoE kernel
#16130
Merged

Commits
  • CUDA: add a fused top-K MoE kernel
    am17an committed 46 days ago
  • Refactor into ggml_cuda_should_use_topk_moe
    am17an committed 46 days ago
  • Review: Use better coalescing pattern, use WARP_SIZE, store logits into registers before
    am17an committed 46 days ago
  • Review: format + micro-optimizations
    am17an committed 46 days ago
  • Fix bug: fix tie breakers
    am17an committed 46 days ago
  • Add optional norm + clean-up code
    am17an committed 46 days ago
  • Use smem for final write
    am17an committed 46 days ago
  • Add bounds check
    am17an committed 45 days ago
  • Use better memory pattern for writeback
    am17an committed 45 days ago
Loading