DeepSpeed
Improve universal checkpoint
#5289
Merged

Improve universal checkpoint #5289

tohtana
run conversion script
7684ae60
add no-parallel path
1ada96df
improve restoring step from universal checkpoint
fe61652c
sort keys when comparing state dicts
2b151166
add unit test for universal checkpoint
575665bf
fix loading with load_optimizer_states=False
c8c0498f
add dp scaling test
8d2dbaad
remove pad for comparison
3547b4eb
refactor test conditions
2b6f6945
Merge branch 'master' into tohtana/unit_test_univ_cp
099133cd
fix for torch adam
ccfba1a7
tohtana tohtana marked this pull request as ready for review 1 year ago
tohtana tohtana requested a review from tjruwase tjruwase 1 year ago
tohtana tohtana requested a review from mrwyattii mrwyattii 1 year ago
tohtana tohtana requested a review from loadams loadams 1 year ago
tjruwase
tjruwase commented on 2024-03-18
simplify argument
24484ad9
tjruwase
tjruwase approved these changes on 2024-03-18
fix for optimizer that doesn't have step in optimizer states
13effa18
add api to load global state to BF16 optimizer for compatibility
101e90c0
restore all fields in param group
76aca755
tjruwase
tjruwase commented on 2024-03-20
tohtana Merge branch 'master' into tohtana/unit_test_univ_cp
c4b2aaab
Merge branch 'master' into tohtana/unit_test_univ_cp
140d704e
move loading function to ZeROOptimizer
9909bd31
refactor to avoid circular import
5114233e
fix format
301e79e5
fix method calls
802397d6
tohtana tohtana enabled auto-merge 1 year ago
tohtana tohtana merged c56a4b9e into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone