Universal checkpoint for zero stage 3 #5475
Enable uni_ckpt loading for z3
2ce408fb
Enable uni_ckpt converting for z3
6eb20f5e
Enable uni_ckpt loading for z3 with zero-infinity
0e6fcdb8
Merge branch 'master' into xylian_uc_stage3
45d9d54e
remove debug statement
31e584cd
fix the naming issue
58c9e957
print the progress info for ZeRO 3
82473818
Merge branch 'master' into xylian_uc_stage3
b27e1ac5
Merge branch 'microsoft:master' into xylian_uc_stage3
502aef8f
support extraction and merging in parallel
bb8dff63
fix issues in corner cases
2aefb1fd
formatting
426417a0
include ZeRO3 to the universal checkpoint test cases
ca028e89
Merge branch 'master' into xylian_uc_stage3
385042bf
fix the compatible issue in universal checkpoint test for ZeRO3
4bd1e6fb
Merge branch 'microsoft:master' into xylian_uc_stage3
a1ed7ed8
the current test logic for optim state does not work for ZeRO3
764c04b8
Merge branch 'master' into xylian_uc_stage3
e4435890
fix the issues that cause the epilogue stuck
ff309f37
Merge branch 'master' into xylian_uc_stage3
9d37f585
tohtana
approved these changes
on 2024-06-24
tohtana
enabled auto-merge 1 year ago
Merge branch 'master' into xylian_uc_stage3
f4ed2207
tohtana
merged
d2b1d7fc
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub