transformers
4df39f47 - Fix lm_head weight tying in Mistral3ForConditionalGeneration (#43429)

Commit
54 days ago
Fix lm_head weight tying in Mistral3ForConditionalGeneration (#43429) * Fix lm_head weight tying in Mistral3ForConditionalGeneration - Add tie_weights() method that checks both config.tie_word_embeddings and config.text_config.tie_word_embeddings - Fixes gibberish output in Ministral-3-3B-Instruct-2512 model - The 3B model has text_config.tie_word_embeddings=True but the base class only checked top-level config.tie_word_embeddings=False * Fix lm_head weight tying in Mistral3ForConditionalGeneration - Add tie_weights() method that checks both config.tie_word_embeddings and config.text_config.tie_word_embeddings - Fixes gibberish output in Ministral-3-3B-Instruct-2512 model - The 3B model has text_config.tie_word_embeddings=True but the base class only checked top-level config.tie_word_embeddings=False * Fix tie_weights method signature to match base class - Update tie_weights() to accept missing_keys and recompute_mapping parameters - Pass parameters correctly to super().tie_weights() - Fixes TypeError when loading models * Fix lm_head weight tying in Mistral3ForConditionalGeneration - Add tie_weights() method that checks both config.tie_word_embeddings and config.text_config.tie_word_embeddings - Fixes gibberish output in Ministral-3-3B-Instruct-2512 model - The 3B model has text_config.tie_word_embeddings=True but the base class only checked the top-level config.tie_word_embeddings=False - Update method signature to match base class requirements * Fix modeling file (auto-generated from modular) - Update modeling_mistral3.py to match modular_mistral3.py changes - Replace _tie_or_clone_weights with direct weight assignment - Note: In normal development, this would be auto-generated by CI * Fix code formatting for ruff compliance - Break long lines to meet ruff formatting requirements - Use double quotes for consistency - Fixes CI check_code_quality failure * Set default tie_word_embeddings for Mistral3Config - Add tie_word_embeddings=True default to Mistral3Config so base tie_weights logic applies - Remove custom tie_weights overrides from mistral3 modular/modeling * Update Mistral3Config docstring for tie_word_embeddings --------- Co-authored-by: aswin.mr <aswin.mr@pearlsofttechnologies.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Author
Parents
Loading