llama.cpp
1c5eba6f - llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)

Commit

1 year ago

llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>

References

#8197 - Add attention and final logit soft-capping, update scaling factor to Gemma2

Author

abetlen

Parents

72272b83

llama.cpp 1c5eba6f - llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)

llama.cpp
1c5eba6f - llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)