llama.cpp
10fcc412 - vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)

Commit

48 days ago

vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656) * vulkan: Update topk_moe fusion to handle gpt's late softmax Based on #16649. * Add ggml_check_edges * Add sync logging to show fusion effects * handle clamp added in #16655 * Update ggml/src/ggml-impl.h Co-authored-by: Diego Devesa <slarengh@gmail.com>

References

#16656 - vulkan: Update topk_moe fusion to handle gpt's late softmax

Author

jeffbolznv

Parents

bcf5bda6

llama.cpp 10fcc412 - vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)

llama.cpp
10fcc412 - vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)