Add support for MiniMax-M2 (#42028)
* update: init m2
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: docs and config
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: init minimax-m2 test
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: fix tests
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: use partial_rotary_factor
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: some fix
Signed-off-by: xuebi <xuebi@minimaxi.com>
* fix: import Unpack from processing_utils
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: apply suggestions from code review
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* update: remove MiniMaxM2DecoderLayer and MiniMaxM2MLP
Signed-off-by: xuebi <xuebi@minimaxi.com>
* update: remove use_qk_norm
* update: remove unused use_qk_norm
* update: update config and attention
* update: add to tokenization_auto and remove unused test
* update: fix decoder layer and experts
* update: fix docs
* update: make ci happy
* refactor: use mapping
* update: remove unused comments
* update: fix rope_params and router
* update: remove rope_theta
* update: test_load_balancing_loss
* update: docs
* update: fix default theta
* update to proper default values, proper config rope, simplified modular
* fix docs
* modular fixup
* review comments
* update slow tests
* style
* fp32 strict
* revert the flag
* sync with latest changes
* fixup buffer init
* add cache exception to minimax m2 as we have a naming clash
* fix dtype issue in gate
* lift fp8 test restriction and apply new linter rules
* update docs
---------
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>