llama.cpp
1725e316 - models : optimize qwen3next graph (#19375)

Commit
3 days ago
models : optimize qwen3next graph (#19375) * models : optimizing qwen3next graph * cont * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * cont : remove redundant q, g chunking * minor * minor * avoid passing masks around * avoid concats during chunking * naming + shapes * update names and use prefix to disable CUDA graphs
Author
Parents
Loading