llama.cpp
CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.)
#19126

Merged

CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.) #19126

am17an merged 3 commits into ggml-org:master from am17an:topk-cuda-refactor

CUDA: refactor topk-moe to enable more models (GLM, Nemotron etc.)

bcbe257b

github-actions added Nvidia GPU

github-actions added ggml

template bias

1ae43b9b

am17an force pushed from 3245ced7 to 1ae43b9b 65 days ago

JohannesGaessler approved these changes on 2026-01-27

review: formatting

eeb9b04a

am17an force pushed from 3ad63db7 to eeb9b04a 64 days ago

am17an merged 3bcc9909 into master 63 days ago

am17an deleted the topk-cuda-refactor branch 63 days ago

Reviewers

JohannesGaessler

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone