models : kda chunk size = 16 (#19827)

Commit

95 days ago

models : kda chunk size = 16 (#19827) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments [no ci] * add kimi linear to delta-net-base * removed unnecessary ggml_cont from g_exp_t * removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp * removed unnecessary diag mask * cont : simplify * cont : avoid graph splits * scale q after mul instead of beginning * scale q after mul instead of beginning * identical ppl * cont : fix scale and decay mask * minor : remove TODO * block implementation for kda * remove space at the end of line 101 * concat+pad * pad+binary row concat * chunk size 16 for kda * removed minor differences to master --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#19827 - Kimi Linear chunk size = 16

Author

ymcki

Parents

2cd20b72

llama.cpp a0ed91a4 - models : kda chunk size = 16 (#19827)

llama.cpp
a0ed91a4 - models : kda chunk size = 16 (#19827)