Megatron-DeepSpeed
0f23a729 - Reshape deepspeed checkpoint (#239)

Commit
3 years ago
Reshape deepspeed checkpoint (#239) * Reshape deepspeed checkpoint * add checkpoint tests * Validate input folder * Tests for tp/pp reshape * remove debug folders * fix test_checkpoint_reshaping_empty_dir * Fix unit tests * Remove deepspeed checkpoint utils * Use DS 3D reshaping utils * convert to bf16 * wip universal chkpt * rename * rename * wip on fragments dealing * cleanup * Loading universal checkpoint with reshaping * all gpu1<->2 reshapes work * param attrs * make the tests adaptable to the number of available gpus * WIP * WIP * WIP * WIP * Debug functions * args should be required, don't create another latest file * Parallelize shard extraction * close+join pool; add tqdm; comment out noise * rename * parameterize * Parallel slice merging * Cleanup * allow inspection on a machine w/o gpus * test against the right DS branch * DS size was merged Co-authored-by: Stas Bekman <stas@stason.org>
Author
Parents
Loading