DeepSpeed
8998707a - Universal Checkpoint for Sequence Parallelism (#4752)

Commit

2 years ago

Universal Checkpoint for Sequence Parallelism (#4752) This PR extends the [universal checkpoint](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing) to support DS sequence parallelism and training scenarios where pipeline parallelism is not enabled. The attached Tensorboard chart show a training scenario (validation curve) where a GPT model is pre-trained with data parallelism (4 GPUs), and checkpoints are saved at the 100th and 200th iterations. The checkpoint at the 100th iteration is later loaded for continual pre-training with different configurations (more GPU resources, data parallelism = 4 GPUs, sequence parallelism = 2 GPUs). <img width="1783" alt="Screenshot 2023-11-28 at 9 11 55 AM" src="https://github.com/microsoft/DeepSpeed/assets/16696152/817141b9-2b37-4a3b-9a47-07324877a4eb"> --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

References

#4752 - Universal Checkpoint for Sequence Parallelism

Author

samadejacobs

Parents

7914b195

DeepSpeed 8998707a - Universal Checkpoint for Sequence Parallelism (#4752)

DeepSpeed
8998707a - Universal Checkpoint for Sequence Parallelism (#4752)