GLM-4 Update (#39393)
* one commit with full
* Create glm4_moe.md
* Update check_config_docstrings.py
* Update __init__.py
* update
* argue
* argue: router problem
* 1
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update modular_glm4_moe.py
* update
* use dsv3 pretrainmodel in modular
* update for test
* upodate new modular
* use LlamaAttention and avoid use CohereAttention cause repeat norm
* update the modular
* update attn modular
* update
* Update modular_glm4_moe.py
* MTP layer is need to ignore
* fix gradient error using with dots_1 method
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>