DeepSpeed
Universal checkpoint for zero stage 1
#2284
Merged

Universal checkpoint for zero stage 1 #2284

tjruwase merged 44 commits into master from olruwase/zero_1_2_universal_ckpt
tjruwase
tjruwase Refactor universal checkpointing and tensor fragments
4b87f300
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
4317b846
tjruwase Formatting
dfc816df
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
21aa55a9
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
c6838919
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
89df0b3c
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
622e7ab8
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
115fe422
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
3c09adaa
tjruwase Merge branch 'master' into olruwase/refactor_universal_checkpoint
d40b9236
tjruwase Support zero stage1; Expand TP dim
7cf02358
tjruwase tjruwase requested a review from jeffra jeffra 3 years ago
tjruwase tjruwase requested a review from mrwyattii mrwyattii 3 years ago
tjruwase tjruwase requested a review from samyam samyam 3 years ago
tjruwase tjruwase requested a review from ShadenSmith ShadenSmith 3 years ago
tjruwase tjruwase requested a review from conglongli conglongli 3 years ago
tjruwase tjruwase requested a review from awan-10 awan-10 3 years ago
tjruwase tjruwase requested a review from cli99 cli99 3 years ago
tjruwase tjruwase requested a review from eltonzheng eltonzheng 3 years ago
tjruwase tjruwase requested a review from minjiaz minjiaz 3 years ago
tjruwase tjruwase requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 3 years ago
tjruwase tjruwase requested a review from duli2012 duli2012 3 years ago
tjruwase tjruwase requested a review from yaozhewei yaozhewei 3 years ago
tjruwase tjruwase requested a review from arashb arashb 3 years ago
tjruwase tjruwase requested a review from xiaoxiawu-microsoft xiaoxiawu-microsoft 3 years ago
tjruwase tjruwase requested a review from samadejacobs samadejacobs 3 years ago
tjruwase tjruwase requested a review from cmikeh2 cmikeh2 3 years ago
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
ece4ce37
tjruwase Remove debug prints
cae21725
tjruwase Merge branch 'olruwase/zero_1_2_universal_ckpt' of github.com:microso…
48b62c29
stas00
tjruwase tjruwase closed this 3 years ago
tjruwase tjruwase reopened this 3 years ago
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
1ace0257
tjruwase Detect sharded optimizer state
529f2d88
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
93246ece
tjruwase Merge master
45320590
tjruwase tjruwase removed review request from arashb arashb 3 years ago
tjruwase tjruwase removed review request from ShadenSmith ShadenSmith 3 years ago
tjruwase tjruwase removed review request from cmikeh2 cmikeh2 3 years ago
tjruwase tjruwase removed review request from duli2012 duli2012 3 years ago
tjruwase tjruwase removed review request from samyam samyam 3 years ago
tjruwase tjruwase removed review request from conglongli conglongli 3 years ago
tjruwase tjruwase removed review request from awan-10 awan-10 3 years ago
tjruwase tjruwase removed review request from samadejacobs samadejacobs 3 years ago
tjruwase tjruwase removed review request from cli99 cli99 3 years ago
tjruwase tjruwase removed review request from yaozhewei yaozhewei 3 years ago
tjruwase tjruwase removed review request from eltonzheng eltonzheng 3 years ago
tjruwase tjruwase removed review request from minjiaz minjiaz 3 years ago
tjruwase tjruwase removed review request from RezaYazdaniAminabadi RezaYazdaniAminabadi 3 years ago
tjruwase tjruwase removed review request from xiaoxiawu-microsoft xiaoxiawu-microsoft 3 years ago
tjruwase Format fixes
024baa8a
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
a2f592d3
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
697287d0
tjruwase tjruwase changed the title Universal checkpoint for zero stage 1 & 2 Universal checkpoint for zero stage 1 3 years ago
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
a5e99007
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
bfefdec6
tjruwase tjruwase requested a review from GuanhuaWang GuanhuaWang 3 years ago
tjruwase tjruwase removed review request from GuanhuaWang GuanhuaWang 3 years ago
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
b78c5f64
tjruwase tjruwase marked this pull request as draft 3 years ago
tjruwase Encode reshaping guide
e3465292
tjruwase Merge branch 'olruwase/zero_1_2_universal_ckpt' of github.com:microso…
1447fc23
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
1bebe2e9
tjruwase tjruwase marked this pull request as ready for review 3 years ago
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
a5afb811
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
bb85c69b
tjruwase More symbolic constants
c929f89f
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
6ba5ad46
mrwyattii
mrwyattii approved these changes on 2022-09-26
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
83ecf190
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
0f1738d6
mayank31398
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
6c6823fe
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
16d26b9c
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
a26458a8
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
48d291a7
mrwyattii Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
e358ee6d
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
4f51d0a9
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
f30ad0fe
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
88bf0452
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
329cb826
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
0d9dc104
tjruwase Merge branch 'master' into olruwase/zero_1_2_universal_ckpt
37485ae1
tjruwase tjruwase merged 799120e7 into master 3 years ago
mrwyattii mrwyattii deleted the olruwase/zero_1_2_universal_ckpt branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone