DeepSpeed
38327e07 - Bug Fix for offload_states API (#7050)

Commit

133 days ago

Bug Fix for offload_states API (#7050) @fukun07 and I discovered a bug when using the `offload_states` and `reload_states` APIs of the Zero3 optimizer. When using grouped parameters (for example, in weight decay or grouped lr scenarios), the order of the parameters mapping in `reload_states` ([here](https://github.com/deepspeedai/DeepSpeed/blob/14b3cce4aaedac69120d386953e2b4cae8c2cf2c/deepspeed/runtime/zero/stage3.py#L2953)) does not correspond with the initialization of `self.lp_param_buffer` ([here](https://github.com/deepspeedai/DeepSpeed/blob/14b3cce4aaedac69120d386953e2b4cae8c2cf2c/deepspeed/runtime/zero/stage3.py#L731)), which leads to misaligned parameter loading. This issue was overlooked by the corresponding unit tests ([here](https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/runtime/zero/test_offload_states.py)), so we fixed the bug in our PR and added the corresponding unit tests. --------- Signed-off-by: Wei Wu <wuwei211x@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>

References

#7050 - Bug Fix for offload_states API

Author

U-rara

Parents

9f20148a

Files2

deepspeed/runtime/zero
- stage3.py
tests/unit/runtime/zero
- test_offload_states.py

DeepSpeed 38327e07 - Bug Fix for offload_states API (#7050)

DeepSpeed
38327e07 - Bug Fix for offload_states API (#7050)