llama.cpp
1c5eba6f - llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)

Commit
1 year ago
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>
Author
Parents
Loading