DeepSpeed
799120e7 - Universal checkpoint for zero stage 1 (#2284)

Comment changes are shownComment changes are hidden
Commit
2 years ago
Universal checkpoint for zero stage 1 (#2284) * Refactor universal checkpointing and tensor fragments * Formatting * Support zero stage1; Expand TP dim * Remove debug prints * Detect sharded optimizer state * Format fixes * Encode reshaping guide * More symbolic constants Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Author
Parents
  • deepspeed
    • checkpoint
      • File
        __init__.py
      • File
        constants.py
      • File
        deepspeed_checkpoint.py
      • File
        reshape_3d_utils.py
      • File
        reshape_utils.py
      • File
        universal_checkpoint.py
      • File
        zero_checkpoint.py
    • runtime
      • File
        bf16_optimizer.py
      • File
        engine.py
      • zero
        • File
          stage_1_and_2.py
    • utils
      • File
        tensor_fragment.py