Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402)
* Fix Gemma4 ZeRO-3 weight loading by correcting base_model_prefix in AudioModel and VisionModel
* Revert VisionModel base_model_prefix change per review feedback
* Fix ZeRO-3 loading: handle buffers in _load_state_dict_into_zero3_model
Buffers registered via register_buffer() were completely skipped
during from_pretrained() under DeepSpeed ZeRO-3. The load() function
in _load_state_dict_into_zero3_model only iterated over named_parameters,
never named_buffers, so buffer values from checkpoint were never loaded
and always reported as MISSING.
Fix: after gathering and loading parameters, explicitly load buffers
directly (no GatheredParameters needed since buffers are not sharded
by ZeRO-3).
Fixes #45397
* Fix indentation in _load_state_dict_into_zero3_model buffer handling
* Add test for ZeRO-3 registered buffer loading
* fix: organize imports and remove unused variable in deepspeed test
* fix: apply ruff formatting to deepspeed test
* fix: copy buffers on all ranks and set _is_hf_initialized in ZeRO-3 load
- Remove rank==0 guard so buffers are copied on all ranks
- Set buf._is_hf_initialized = True after copy to prevent re-initialization
- Update test to verify buffer VALUES survive ZeRO-3 from_pretrained round-trip
---------
Co-authored-by: saslifat-gif <saslifat@email.com>