gemma : more consistent attention scaling for v2 and v3 #13951
gemma : fix attn scale for 27B
36469ad8
cont : apply scale before attn
67c4346e
ggerganov
marked this pull request as draft 103 days ago
cont : consistent attention scaling
fbc6df02
ggerganov
changed the title gemma : fix attn scale for 27B gemma : more consistent attention scaling for v2 and v3 102 days ago
ggerganov
marked this pull request as ready for review 102 days ago
ggerganov
merged
5582c49c
into master 102 days ago
ggerganov
deleted the gg/gemma-fix-attn-scale branch 102 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub