llama.cpp
e56abd20 - vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)

Commit
57 days ago
vulkan: Implement topk_moe fused shader, ported from CUDA (#16641) This is similar to the CUDA shader from #16130, but doesn't use shared memory and handles different subgroup sizes.
Author
Parents
Loading