Megatron-DeepSpeed
a branch combining layer-norm-auto-sync and ds_ckpt_reshape
#292
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
45
Changes
View On
GitHub
Commits
Reshape deepspeed checkpoint
tjruwase
committed
4 years ago
Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
stas00
committed
4 years ago
add checkpoint tests
stas00
committed
4 years ago
Validate input folder
tjruwase
committed
4 years ago
Tests for tp/pp reshape
tjruwase
committed
4 years ago
remove debug folders
stas00
committed
4 years ago
fix test_checkpoint_reshaping_empty_dir
stas00
committed
4 years ago
Fix unit tests
tjruwase
committed
4 years ago
Remove deepspeed checkpoint utils
tjruwase
committed
4 years ago
Use DS 3D reshaping utils
tjruwase
committed
4 years ago
sync layer norms
stas00
committed
4 years ago
all_reduce is an in_place operation
thomasw21
committed
4 years ago
Make dataloader use another random generator (#276)
thomasw21
committed
4 years ago
do all_reduce op.AVG directly
stas00
committed
4 years ago
Merge remote-tracking branch 'origin/main' into layer-norm-auto-sync
thomasw21
committed
3 years ago
add eval dataloader deadlock workaround
stas00
committed
3 years ago
Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
stas00
committed
3 years ago
convert to bf16
stas00
committed
3 years ago
wip universal chkpt
stas00
committed
3 years ago
rename
stas00
committed
3 years ago
rename
stas00
committed
3 years ago
revert generator sync
stas00
committed
3 years ago
wip on fragments dealing
stas00
committed
3 years ago
cleanup
stas00
committed
3 years ago
Loading universal checkpoint with reshaping
tjruwase
committed
3 years ago
all gpu1<->2 reshapes work
stas00
committed
3 years ago
param attrs
tjruwase
committed
3 years ago
make the tests adaptable to the number of available gpus
stas00
committed
3 years ago
WIP
tjruwase
committed
3 years ago
WIP
tjruwase
committed
3 years ago
WIP
tjruwase
committed
3 years ago
WIP
tjruwase
committed
3 years ago
Merge remote-tracking branch 'origin/main' into layer-norm-auto-sync
stas00
committed
3 years ago
Debug functions
tjruwase
committed
3 years ago
args should be required, don't create another latest file
stas00
committed
3 years ago
Parallelize shard extraction
tjruwase
committed
3 years ago
close+join pool; add tqdm; comment out noise
stas00
committed
3 years ago
rename
stas00
committed
3 years ago
parameterize
stas00
committed
3 years ago
Parallel slice merging
tjruwase
committed
3 years ago
Cleanup
tjruwase
committed
3 years ago
Merge remote-tracking branch 'origin/main' into ds_ckpt_reshape
stas00
committed
3 years ago
Merge remote-tracking branch 'origin/main' into layer-norm-auto-sync
stas00
committed
3 years ago
allow inspection on a machine w/o gpus
stas00
committed
3 years ago
Merge branch 'layer-norm-auto-sync' into ds_ckpt_reshape-with-layer-norm-auto-sync
stas00
committed
3 years ago
Loading