DeepSpeed
Fix scaling and allgather with `torch.autocast`
#7534
Merged

Fix scaling and allgather with `torch.autocast` #7534

tohtana merged 12 commits into master from tohtana/fix_autocast_scaler
tohtana
tohtana tohtana requested a review from tjruwase tjruwase 23 days ago
tohtana tohtana force pushed from 5bdc312b to 5173baa1 23 days ago
tohtana tohtana requested a review from loadams loadams 23 days ago
sfc-gh-truwase
sfc-gh-truwase commented on 2025-09-02
tohtana use scaler only for fp16
f13ab493
tohtana fix autocast for z3 allgather
1a9bee1a
sfc-gh-truwase ZeRO3: Improve mismatch detection (#7525)
daca8f11
digger-yu fix typo s/1014 /1024 (#7528)
13aed49b
stas00 undo the revert (#7536)
e2436298
stas00 [logging] less startup noise (#7526)
968d0a62
stas00 [doc] fixing moe tutorial (#7538)
18b66e4c
jakehemmerle docs typo: `lrrt.md`, reference to `cycle_min_lr` should be `cycle_ma…
159abbf3
qibin0506 fixed DeepSpeedCPULion with ZeRO-Offload bug (#7531)
be13e1b8
tohtana remove memory allocation of unused buffer
b4d8650b
tohtana update comment
f2f77803
tohtana tohtana force pushed from 97a71fc9 to f2f77803 22 days ago
tohtana tohtana requested a review from jomayeri jomayeri 22 days ago
sfc-gh-truwase Merge branch 'master' into tohtana/fix_autocast_scaler
c4b7b2ff
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2025-09-03
tohtana tohtana enabled auto-merge (squash) 22 days ago
tohtana tohtana merged 1e183a6a into master 22 days ago
tohtana tohtana deleted the tohtana/fix_autocast_scaler branch 22 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone