model : add grok-2 support #15539
add grok-2 support
551a64f2
type fix
301ba771
type fix
cf87c766
type fix
f582b845
"fix" vocab for invalid sequences
711ab174
fix expert tensor mapping and spaces in vocab
8edece83
add chat template
3ef6cf57
fix norm tensor mapping
25e4e5f0
rename layer_out_norm to ffn_post_norm
4a53f132
ensure ffn_post_norm is mapped
e0a0024e
fix experts merging
d7efed89
remove erroneous FFN_GATE entry
92266e96
concatenate split tensors and add more metadata
6b3f7755
process all expert layers and try cat instead of hstack
c5566638
add support for community BPE vocab
9f868763
fix expert feed forward length and ffn_down concat
5d4e4073
commit this too
3e83c648
add ffn_up/gate/down, unsure if sequence is right
b1627ce5
add ffn_gate/down/up to tensor names
00481afe
correct residual moe (still not working)
2e8b67b0
mess--
94bcbbfe
fix embedding scale being applied twice
b7675ea0
add built in chat template
6cf16aaf
change beta fast for grok if default value
4abde12c
remove spm vocab in favor of community bpe vocab
705f84a7
change attention temp length metadata type to integer
a8fa83f2
update attention temp length metadata
05b52fa5
remove comment
b7bfc9a6
Merge branch 'master' into cisc/grok-2
c0d755cd
CISC
marked this pull request as ready for review 187 days ago
slaren
commented
on 2025-09-03
replace M_SQRT2 with std::sqrt(2)
0408a4fa
Merge branch 'master' into cisc/grok-2
ed4d8f22
add yarn metadata, move defaults to hparams
d032a1b0
ggerganov
approved these changes
on 2025-09-13
CISC
merged
b8e09f08
into master 175 days ago
CISC
deleted the cisc/grok-2 branch 175 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub