llama.cpp
5582c49c - gemma : more consistent attention scaling for v2 and v3 (#13951)

Commit

130 days ago

gemma : more consistent attention scaling for v2 and v3 (#13951) * gemma : fix attn scale for 27B * cont : apply scale before attn * cont : consistent attention scaling

References

#13951 - gemma : more consistent attention scaling for v2 and v3

Author

ggerganov

Parents

c9bbc779

llama.cpp 5582c49c - gemma : more consistent attention scaling for v2 and v3 (#13951)

llama.cpp
5582c49c - gemma : more consistent attention scaling for v2 and v3 (#13951)