transformers
7a2bf25f - Add support for MiniMax-M2 (#42028)

Commit
5 days ago
Add support for MiniMax-M2 (#42028) * update: init m2 Signed-off-by: xuebi <xuebi@minimaxi.com> * update: docs and config Signed-off-by: xuebi <xuebi@minimaxi.com> * update: init minimax-m2 test Signed-off-by: xuebi <xuebi@minimaxi.com> * update: fix tests Signed-off-by: xuebi <xuebi@minimaxi.com> * update: use partial_rotary_factor Signed-off-by: xuebi <xuebi@minimaxi.com> * update: some fix Signed-off-by: xuebi <xuebi@minimaxi.com> * fix: import Unpack from processing_utils Signed-off-by: xuebi <xuebi@minimaxi.com> * update: apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * update: remove MiniMaxM2DecoderLayer and MiniMaxM2MLP Signed-off-by: xuebi <xuebi@minimaxi.com> * update: remove use_qk_norm * update: remove unused use_qk_norm * update: update config and attention * update: add to tokenization_auto and remove unused test * update: fix decoder layer and experts * update: fix docs * update: make ci happy * refactor: use mapping * update: remove unused comments * update: fix rope_params and router * update: remove rope_theta * update: test_load_balancing_loss * update: docs * update: fix default theta * update to proper default values, proper config rope, simplified modular * fix docs * modular fixup * review comments * update slow tests * style * fp32 strict * revert the flag * sync with latest changes * fixup buffer init * add cache exception to minimax m2 as we have a naming clash * fix dtype issue in gate * lift fp8 test restriction and apply new linter rules * update docs --------- Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Author
Parents
Loading