DeepSpeed
Universal checkpoint for zero stage 3
#5475
Merged

Universal checkpoint for zero stage 3 #5475

xylian86
xylian86 Enable uni_ckpt loading for z3
2ce408fb
xylian86 Enable uni_ckpt converting for z3
6eb20f5e
xylian86 Enable uni_ckpt loading for z3 with zero-infinity
0e6fcdb8
xylian86 xylian86 requested a review from tjruwase tjruwase 1 year ago
xylian86 xylian86 requested a review from mrwyattii mrwyattii 1 year ago
xylian86
tjruwase tjruwase removed review request from mrwyattii mrwyattii 1 year ago
tjruwase tjruwase requested a review from samadejacobs samadejacobs 1 year ago
tjruwase tjruwase requested a review from tohtana tohtana 1 year ago
tjruwase tjruwase requested a review from lekurile lekurile 1 year ago
tohtana
tohtana commented on 2024-05-04
tjruwase Merge branch 'master' into xylian_uc_stage3
45d9d54e
samadejacobs
samadejacobs commented on 2024-05-06
samadejacobs
samadejacobs commented on 2024-05-06
xylian86 remove debug statement
31e584cd
xylian86 fix the naming issue
58c9e957
xylian86 print the progress info for ZeRO 3
82473818
xylian86 Merge branch 'master' into xylian_uc_stage3
b27e1ac5
xylian86 Merge branch 'microsoft:master' into xylian_uc_stage3
502aef8f
xylian86 support extraction and merging in parallel
bb8dff63
xylian86 fix issues in corner cases
2aefb1fd
xylian86
xylian86 formatting
426417a0
xylian86 include ZeRO3 to the universal checkpoint test cases
ca028e89
xylian86 xylian86 requested a review from loadams loadams 1 year ago
xylian86 Merge branch 'master' into xylian_uc_stage3
385042bf
xylian86 fix the compatible issue in universal checkpoint test for ZeRO3
4bd1e6fb
xylian86 Merge branch 'microsoft:master' into xylian_uc_stage3
a1ed7ed8
xylian86 the current test logic for optim state does not work for ZeRO3
764c04b8
tjruwase Merge branch 'master' into xylian_uc_stage3
e4435890
xylian86 fix the issues that cause the epilogue stuck
ff309f37
tohtana Merge branch 'master' into xylian_uc_stage3
9d37f585
tohtana
tohtana approved these changes on 2024-06-24
tohtana tohtana enabled auto-merge 1 year ago
tjruwase Merge branch 'master' into xylian_uc_stage3
f4ed2207
tohtana tohtana merged d2b1d7fc into master 1 year ago
Lomax314
ArtificialZeng
xylian86

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone