DeepSpeed
ZeRO-2
#217
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
57
Changes
View On
GitHub
ZeRO-2
#217
jeffra
merged 57 commits into
master
from
zero2-staging
Squased dev zero (#14)
e67a3def
Support for new apex style optimizer.step(), grad_clip bug fix in Zer…
110b7ad1
Formatting fix
4e33696c
Fix several unit tests, some still broken
84f5ba6a
Adding activation checkpointing as deepspeed file
ed8841f9
adding hash to deepspeed_checkpointing
346c5c28
formatting
30c4f6c5
zero stage 2 does not support grad accu, also fix formatting
22572064
fix checkpoint tests, remove catch all try/except, increase pytest ti…
1ce64def
fix test_zero_static_scale unit test
eb6f3c65
disable empty partition test for now
6b3b2a8b
remove conversion of loss to float before scaling (tests failing)
e21e1350
update squad model tests to run zero2 and use bsz=6
33224c77
add squad zero2 config
4dca61e8
Adding support for deepspeed_checkpointing through deepspeed.checkpoi…
0ca3bd5b
Removed redundant model tests. Added testing deepspeed activation che…
720fab7b
return float conversion in backward, convert loss to float before gra…
c0cb47ea
Refactored deepspeed.checkpointing API to pass ds_config directly to …
4f433ee0
fixing test paths
aebd1f29
Optional loading of optimizer and learning rate scheduler states in
dace62ce
Fix formatting issues
16e82024
Strict option for checkpoint loading
54f47d42
Enable loading checkpoints without optimizer state with different DP …
ac8a526d
Fix bug
9d0b194a
Updating Megatron Tutorial
21cb8217
replacing perf section in Megatron Tutorial
72210701
megatron tutorial updates and activation checkpointing json configs
8a555b20
Documentation : Added code comments to deepspeed.checkpointing \ Adde…
591bdc63
Updating Megatron Tutorial
0ce25619
replacing perf section in Megatron Tutorial
5865b9a5
getting docs to build
624eca51
getting docs to build
4e391da5
formatting
2ee9706d
Addressing Jeff and Shaden's feedback on documentation
3648a87b
add compute & communication overlapping
d0940a03
fix the format error
0b44ab68
update DSE submodule to point to DS 0.2 version
9131be09
update according to code review feedback
1c17fa77
check contigious_gradients before using previous_reduced_grads
e85d3c3d
Adding more documentations, in features.md and index.md (#33)
b12eb2e6
update nav bar docs
399cc2da
change ordering of tutorials
8d8290ae
reorder nav
b68929e7
contigious -> contiguous
8f7322cd
fixing my merge handiwork (#43)
fd111c0f
updating code docs (#42)
4f9e6502
Web/doc edits (#45)
e1504d81
Adding back previous zero optimization (bool) (#44)
16f96b2d
update DSE commit
219e6220
Merge branch 'master' into zero2-staging
28f01e0e
few sentences on low bandwidth clusters in Megatron Tutorial (#46)
2a915cd4
Merge branch 'dev-zero-may1' into zero2-staging
8c4f9405
news updates
1ab5961c
bump to v0.2.0, ignore *.log, use cache_dir in megatron tests
914aa356
blurb for news items
c352f164
update
c7cbcdac
blurb updates
02eb292b
jeffra
merged
f2ac7eaf
into master
5 years ago
jeffra
deleted the zero2-staging branch
5 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub