Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263
ignore overlap/contiguous_gradients if using zero 1 (#1246)
e5ecdf54
Make round robin gradient partitioning configurable (default False) (…
5bb09f87
pass GAS boundary state from PP -> ZeRO
d370f535
Use correct default for round robin gradients (#1258)
0067c88e
formatting
624303f2
jeffra
merged
f93e22b3
into big-science 4 years ago
jeffra
deleted the jeffra/big-science-patches branch 4 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub