DeepSpeed
Fix the universal checkpoint issue for stage3 when there are multiple subgroups.
#7585
Open

Fix the universal checkpoint issue for stage3 when there are multiple subgroups. #7585

zhengchenyu wants to merge 2 commits into deepspeedai:master from zhengchenyu:fix.universal
zhengchenyu
zhengchenyu Fix the universal checkpoint issue for stage3 when there are multiple…
4fd8dbdf
zhengchenyu zhengchenyu requested a review from tjruwase tjruwase 1 day ago
zhengchenyu zhengchenyu requested a review from tohtana tohtana 1 day ago
zhengchenyu zhengchenyu requested a review from loadams loadams 1 day ago
sfc-gh-truwase sfc-gh-truwase requested a review from sfc-gh-truwase sfc-gh-truwase 1 day ago
sfc-gh-truwase
sfc-gh-truwase commented on 2025-09-23
sfc-gh-truwase
sfc-gh-truwase commented on 2025-09-23
sfc-gh-truwase sfc-gh-truwase removed review request from tohtana tohtana 1 day ago
sfc-gh-truwase sfc-gh-truwase removed review request from loadams loadams 1 day ago
sfc-gh-truwase
xylian86
zhengchenyu
xylian86
zhengchenyu
xylian86
xylian86
xylian86 commented on 2025-09-23
zhengchenyu update
1e2eed41

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone