llama.cpp
9a3ea685 - CUDA: Fix bug in topk-moe for gpt-oss (#16821)

Commit

52 days ago

CUDA: Fix bug in topk-moe for gpt-oss (#16821) * CUDA: Fix bug in topk-moe for gpt-oss When using ggml_can_fuse_subgraph, the output nodes which are passed are wrong. This causes `test-backend-ops` to still fuse ndoes (because the nodes are not used elsewhere in the graph), but it actually doesn't fuse in the actual gpt-oss * fix for qwen3 too * change ifndef to ifdef

References

#16821 - CUDA: Fix bug in topk-moe for gpt-oss

Author

am17an

Parents

338074c3

llama.cpp 9a3ea685 - CUDA: Fix bug in topk-moe for gpt-oss (#16821)

llama.cpp
9a3ea685 - CUDA: Fix bug in topk-moe for gpt-oss (#16821)