Universal checkpoint for zero stage 3 (#5475)
This PR enables the universal checkpoint for zero stage 3.
Notes:
- The current implementation supports Data parallelism.
- Development is ongoing for universal checkpoint Stage 3 with
tensor-slicing model parallelism.
- Pipeline parallelism is not supported by ZeRO Stage 3, and hence is
not included in this universal checkpoint implementation.
In this PR:
- I've updated `deepspeed/checkpoint/ds_to_universal.py ` to support
converting Zero checkpoints into Universal checkpoints.
- I've updated `deepspeed/runtime/zero/stage3.py` to enable loading
Universal checkpoints using the Stage 3 optimizer.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>