transformers
cac0a28c - Add afmoe model (#42168)

Commit
53 days ago
Add afmoe model (#42168) * Add AFMoE model support * Address review feedback for AFMoE implementation * Add flex attention support to AFMoE model * Fix expert_bias routing in AFMoE * Remove test-results directory * Address PR review feedback for AFMoE model * fix(afmoe): ensure RMSNorm output dtype matches input dtype) * properly return attn weights * fix most tests * cleanup Remove shared expert if else as defaults to 2 Remove `route_norm` as it default to `True`. Make test smaller faster * fix input embeds api * update rope API, smaller test and should be good to go * oups wront place to skip unittest * quality * update * rope parameter docstring fill --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>
Author
Parents
Loading