Fix ZeRO-3 forward crash on modules with plain dict _parameters (#8009)
## Summary
Fixes #6961
ZeRO-3 forward crashes with `AttributeError: 'dict' object has no
attribute '_in_forward'` since torch 2.5. PyTorch changed
`nn.Module._parameters` from `OrderedDict` to plain `dict`
(pytorch/pytorch#129164), and a plain `dict` does not allow attribute
assignment.
DeepSpeed wraps every module into `ZeROOrderedDict` at engine init via
`_inject_parameters`. Any module not present at that point keeps the
plain dict and crashes the next forward. This includes a submodule
attached after `deepspeed.initialize()` (PEFT/LoRA adapters), or a
module restored by `deepspeed/compile/init_z3.py:35`.
The fix adds `ensure_zero_ordered_dict()` and calls it from the forward
prologue. It wraps lazily, is idempotent, and keeps the original
container so the deepcompile un-injection path still works. The epilogue
gets an `isinstance` guard for modules that show up between the two
hooks.
This only fixes the crash. Late-attached parameters are still not in the
optimizer and not partitioned by ZeRO-3. For full ZeRO-3 semantics on a
late adapter, build it inside `deepspeed.zero.Init()`.
## Tests
`tests/unit/runtime/zero/test_zero_late_module_attach.py`
- forward after attaching a Linear post-init, with `_parameters` forced
to plain dict so the bug reproduces on any torch version
- repeated forwards do not re-wrap an already-wrapped module
Signed-off-by: Sung Hyun Cho <hope5487@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>