transformers
f67ebcd5 - Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402)

Commit
9 days ago
Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402) * Fix Gemma4 ZeRO-3 weight loading by correcting base_model_prefix in AudioModel and VisionModel * Revert VisionModel base_model_prefix change per review feedback * Fix ZeRO-3 loading: handle buffers in _load_state_dict_into_zero3_model Buffers registered via register_buffer() were completely skipped during from_pretrained() under DeepSpeed ZeRO-3. The load() function in _load_state_dict_into_zero3_model only iterated over named_parameters, never named_buffers, so buffer values from checkpoint were never loaded and always reported as MISSING. Fix: after gathering and loading parameters, explicitly load buffers directly (no GatheredParameters needed since buffers are not sharded by ZeRO-3). Fixes #45397 * Fix indentation in _load_state_dict_into_zero3_model buffer handling * Add test for ZeRO-3 registered buffer loading * fix: organize imports and remove unused variable in deepspeed test * fix: apply ruff formatting to deepspeed test * fix: copy buffers on all ranks and set _is_hf_initialized in ZeRO-3 load - Remove rank==0 guard so buffers are copied on all ranks - Set buf._is_hf_initialized = True after copy to prevent re-initialization - Update test to verify buffer VALUES survive ZeRO-3 from_pretrained round-trip --------- Co-authored-by: saslifat-gif <saslifat@email.com>
Author
Parents
Loading