llama.cpp
506bb6e0 - model: try to improve Qwen3 Next (#18683)

Commit

36 days ago

model: try to improve Qwen3 Next (#18683) * qwen3next: simplify qkvz projection * use ggml_swiglu_split * revert swiglu_split, but remove redundant repeat() * fix missing reshape * rm 2 redundant transposes * move mul_mat(k,q) to outside of chunking * rm redundant cont * improve g_cs_chunk * add comments about no cont * use std::pair instead of ggml_concat * vectorize key_gdiff calculation * rm unused tensor * avoid ggml_concat inside loop * bring back ggml_concat as it may not work on other backend * nits

References

#18683 - model: try to improve Qwen3 Next

Author

ngxson

Parents

79456a69

llama.cpp 506bb6e0 - model: try to improve Qwen3 Next (#18683)

llama.cpp
506bb6e0 - model: try to improve Qwen3 Next (#18683)