DeepSpeed
Fix the universal checkpoint issue for stage3 when there are multiple subgroups.
#7585
Open

Fix the universal checkpoint issue for stage3 when there are multiple subgroups. #7585

zhengchenyu wants to merge 3 commits into deepspeedai:master from zhengchenyu:fix.universal
zhengchenyu
zhengchenyu Fix the universal checkpoint issue for stage3 when there are multiple…
4fd8dbdf
zhengchenyu zhengchenyu requested a review from tjruwase tjruwase 2 days ago
zhengchenyu zhengchenyu requested a review from tohtana tohtana 2 days ago
zhengchenyu zhengchenyu requested a review from loadams loadams 2 days ago
sfc-gh-truwase sfc-gh-truwase requested a review from sfc-gh-truwase sfc-gh-truwase 1 day ago
sfc-gh-truwase
sfc-gh-truwase commented on 2025-09-23
sfc-gh-truwase
sfc-gh-truwase commented on 2025-09-23
sfc-gh-truwase sfc-gh-truwase removed review request from tohtana tohtana 1 day ago
sfc-gh-truwase sfc-gh-truwase removed review request from loadams loadams 1 day ago
sfc-gh-truwase
xylian86
zhengchenyu
xylian86
zhengchenyu
xylian86
xylian86
xylian86 commented on 2025-09-23
zhengchenyu update
1e2eed41
sfc-gh-truwase Merge branch 'master' into fix.universal
813fd4f5
zhengchenyu
xylian86

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone