Megatron-DeepSpeed
Reshape deepspeed checkpoint
#239
Merged

Reshape deepspeed checkpoint #239

stas00 merged 39 commits into main from ds_ckpt_reshape
tjruwase
tjruwase Reshape deepspeed checkpoint
67c08f09
tjruwase tjruwase requested a review from stas00 stas00 4 years ago
stas00 Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
fec1ec5f
stas00
stas00
stas00 add checkpoint tests
675f12ca
stas00
stas00 commented on 2022-01-25
stas00
stas00 commented on 2022-01-25
tjruwase
tjruwase Validate input folder
e379065b
tjruwase
stas00
tjruwase Tests for tp/pp reshape
a1068e4d
tjruwase
tjruwase commented on 2022-01-25
stas00 remove debug folders
115bd313
stas00 fix test_checkpoint_reshaping_empty_dir
cc2fad1f
stas00
stas00 commented on 2022-01-25
stas00
stas00 commented on 2022-01-25
stas00
stas00 commented on 2022-01-25
tjruwase Fix unit tests
b6733d57
tjruwase Remove deepspeed checkpoint utils
9bf7ac51
sdtblck
tjruwase
tjruwase Use DS 3D reshaping utils
29ca2bcc
stas00
thomwolf
StellaAthena
stas00 Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
a3ef7783
stas00 convert to bf16
6d863582
tjruwase tjruwase requested a review 3 years ago
stas00
stas00 wip universal chkpt
804b497d
stas00 rename
c29d3369
stas00 rename
9c447933
stas00 wip on fragments dealing
7e0a81b9
stas00 cleanup
d3005120
tjruwase Loading universal checkpoint with reshaping
ab0a7f8f
stas00 all gpu1<->2 reshapes work
d5e33dec
tjruwase param attrs
85ff56ca
stas00 make the tests adaptable to the number of available gpus
f01fa4a5
tjruwase WIP
f29bacc1
tjruwase WIP
dd0aeb67
tjruwase WIP
3bf14fdf
tjruwase WIP
7ae002d3
tjruwase Debug functions
55bb5148
stas00 args should be required, don't create another latest file
795fedbb
stas00
tjruwase Parallelize shard extraction
cc8810be
stas00 close+join pool; add tqdm; comment out noise
04d9ad0f
stas00 rename
bca5af4e
stas00 parameterize
721380b2
tjruwase Parallel slice merging
e8a1ccf1
tjruwase Cleanup
a247614b
Muennighoff
Muennighoff
Muennighoff commented on 2022-06-23
stas00 Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
9bb3dc33
stas00 allow inspection on a machine w/o gpus
d845a1f0
stas00 Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
9fa081b2
stas00 test against the right DS branch
90d720c0
stas00 Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
9edd9393
stas00 DS size was merged
ebff495c
stas00 stas00 merged 0f23a729 into main 3 years ago
stas00 stas00 deleted the ds_ckpt_reshape branch 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone