DeepSpeed
Correctness fix PP+ZeRO for gradient accumulation + updates from master
#1263
Merged

Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

jeffra merged 5 commits into big-science from jeffra/big-science-patches
jeffra
jeffra ignore overlap/contiguous_gradients if using zero 1 (#1246)
e5ecdf54
tjruwase Make round robin gradient partitioning configurable (default False) (…
5bb09f87
jeffra pass GAS boundary state from PP -> ZeRO
d370f535
tjruwase Use correct default for round robin gradients (#1258)
0067c88e
jeffra jeffra requested a review from awan-10 awan-10 4 years ago
jeffra jeffra requested a review from cli99 cli99 4 years ago
jeffra jeffra requested a review from conglongli conglongli 4 years ago
jeffra jeffra requested a review from eltonzheng eltonzheng 4 years ago
jeffra jeffra requested a review from minjiaz minjiaz 4 years ago
jeffra jeffra requested a review from niumanar niumanar 4 years ago
jeffra jeffra requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 4 years ago
jeffra jeffra requested a review from samyam samyam 4 years ago
jeffra jeffra requested a review from ShadenSmith ShadenSmith 4 years ago
jeffra jeffra requested a review from tjruwase tjruwase 4 years ago
jeffra formatting
624303f2
ShadenSmith
ShadenSmith approved these changes on 2021-07-30
tjruwase
tjruwase commented on 2021-07-30
jeffra jeffra merged f93e22b3 into big-science 4 years ago
jeffra jeffra deleted the jeffra/big-science-patches branch 4 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone