Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3 (#43574)
* Prevent redundant initialization in lecun_normal_ and default_flax_embed_init
* apply style
* fix check_repository_consistency
* lecun_normal_ & default_flax_embed init > initialization.py
* Update src/transformers/initialization.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Rename `_variance_scaling_` to `_variance_scaling` for consistency and update references
* Refactor initialization calls in `Phi4MultimodalVisionPreTrainedModel`: remove redundant `init.` prefix for clarity
* Fix initialization calls in `Phi4MultimodalVisionPreTrainedModel`: update to use `init` namespace for clarity
* Add test for SigLIP model initialization with DeepSpeed ZeRO-3
* Update tests/deepspeed/test_deepspeed.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Update tests/deepspeed/test_deepspeed.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Apply suggestion from @vasqu
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* fix: update embedding initialization function to use the correct suffix
* add test for variance scaling initialization with DeepSpeed ZeRO-3 in SigLIP models
* small nits
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>