llama.cpp
b365c3ff - vulkan/cuda: fix topk_moe with exp_probs_b (#18071)

Commit
7 days ago
vulkan/cuda: fix topk_moe with exp_probs_b (#18071) I updated test_topk_moe to more closely match llm_graph_context::build_moe_ffn and added coverage for exp_probs_b and some other missing combinations. This exposed a bug in both CUDA and Vulkan backends where they were assuming the input to argsort and the input to get_rows are the same. I'd like to optimize this graph in another change, but for now just get it functional. CUDA also had a bug where it got n_experts from the wrong place, leading to GGML_ASSERT failures in some of the new tests.
Author
Parents
Loading