CUDA: add a fused top-K MoE kernel #16130
am17an
force pushed
from
4d6c41a6
to
7345668b
20 days ago
am17an
force pushed
from
324ecbb1
to
613b6c39
19 days ago
am17an
force pushed
from
613b6c39
to
17c9e7c9
18 days ago
am17an
force pushed
from
17c9e7c9
to
7a258bfa
17 days ago
am17an
force pushed
from
7a258bfa
to
a275f10b
16 days ago
am17an
force pushed
from
a275f10b
to
bb0e5d07
16 days ago
am17an
force pushed
from
bb0e5d07
to
2ea8133b
15 days ago
CUDA: add a fused top-K MoE kernel
a208f5c9
Refactor into ggml_cuda_should_use_topk_moe
9fc0396e
Review: Use better coalescing pattern, use WARP_SIZE, store logits in…
8b780cce
Review: format + micro-optimizations
ce867aa7
Fix bug: fix tie breakers
2930668c
Add optional norm + clean-up code
240b2c1f
am17an
force pushed
from
2ea8133b
to
4b2d2b9f
15 days ago
am17an
force pushed
from
4b2d2b9f
to
639e9543
15 days ago
Use smem for final write
e772b28f
am17an
force pushed
from
639e9543
to
e772b28f
15 days ago
Add bounds check
53acfe61
am17an
force pushed
from
941bc9e6
to
53acfe61
15 days ago
Use better memory pattern for writeback
33856e1c
am17an
deleted the cuda_topk_moe branch 14 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub