PR #217 ZeRO-2 - SemanticDiff

Squased dev zero (#14)

e67a3def

Support for new apex style optimizer.step(), grad_clip bug fix in Zer…

110b7ad1

Formatting fix

4e33696c

Fix several unit tests, some still broken

84f5ba6a

Adding activation checkpointing as deepspeed file

ed8841f9

adding hash to deepspeed_checkpointing

346c5c28

formatting

30c4f6c5

zero stage 2 does not support grad accu, also fix formatting

22572064

fix checkpoint tests, remove catch all try/except, increase pytest ti…

1ce64def

fix test_zero_static_scale unit test

eb6f3c65

disable empty partition test for now

6b3b2a8b

remove conversion of loss to float before scaling (tests failing)

e21e1350

update squad model tests to run zero2 and use bsz=6

33224c77

add squad zero2 config

4dca61e8

Adding support for deepspeed_checkpointing through deepspeed.checkpoi…

0ca3bd5b

Removed redundant model tests. Added testing deepspeed activation che…

720fab7b

return float conversion in backward, convert loss to float before gra…

c0cb47ea

Refactored deepspeed.checkpointing API to pass ds_config directly to …

4f433ee0

fixing test paths

aebd1f29

Optional loading of optimizer and learning rate scheduler states in

dace62ce

Fix formatting issues

16e82024

Strict option for checkpoint loading

54f47d42

Enable loading checkpoints without optimizer state with different DP …

ac8a526d

Fix bug

9d0b194a

Updating Megatron Tutorial

21cb8217

replacing perf section in Megatron Tutorial

72210701

megatron tutorial updates and activation checkpointing json configs

8a555b20

Documentation : Added code comments to deepspeed.checkpointing \ Adde…

591bdc63

Updating Megatron Tutorial

0ce25619

replacing perf section in Megatron Tutorial

5865b9a5

getting docs to build

624eca51

getting docs to build

4e391da5

formatting

2ee9706d

Addressing Jeff and Shaden's feedback on documentation

3648a87b

add compute & communication overlapping

d0940a03

fix the format error

0b44ab68

update DSE submodule to point to DS 0.2 version

9131be09

update according to code review feedback

1c17fa77

check contigious_gradients before using previous_reduced_grads

e85d3c3d

Adding more documentations, in features.md and index.md (#33)

b12eb2e6

update nav bar docs

399cc2da

change ordering of tutorials

8d8290ae

reorder nav

b68929e7

contigious -> contiguous

8f7322cd

fixing my merge handiwork (#43)

fd111c0f

updating code docs (#42)

4f9e6502

Web/doc edits (#45)

e1504d81

Adding back previous zero optimization (bool) (#44)

16f96b2d

update DSE commit

219e6220

Merge branch 'master' into zero2-staging

28f01e0e

few sentences on low bandwidth clusters in Megatron Tutorial (#46)

2a915cd4

Merge branch 'dev-zero-may1' into zero2-staging

8c4f9405

news updates

1ab5961c

bump to v0.2.0, ignore *.log, use cache_dir in megatron tests

914aa356

blurb for news items

c352f164

update

c7cbcdac

blurb updates

02eb292b

jeffra merged f2ac7eaf into master 6 years ago

jeffra deleted the zero2-staging branch 5 years ago

DeepSpeed
ZeRO-2
#217

Merged

ZeRO-2 #217

DeepSpeed ZeRO-2 #217 Merged

ZeRO-2 #217

DeepSpeed
ZeRO-2
#217

Merged