DeepSpeed
Enable tensor fragments for zero 2 & 3
#2727
Merged

Enable tensor fragments for zero 2 & 3 #2727

tjruwase merged 83 commits into master from olruwase/tensor_fragments
tjruwase
tjruwase Enable tensor fragments for zero 2
bd0f7da9
tjruwase tjruwase requested a review from jeffra jeffra 2 years ago
tjruwase tjruwase requested a review from jomayeri jomayeri 2 years ago
tjruwase tjruwase requested a review from samyam samyam 2 years ago
tjruwase tjruwase requested a review from mrwyattii mrwyattii 2 years ago
tjruwase tjruwase requested a review from awan-10 awan-10 2 years ago
tjruwase Merge branch 'master' into olruwase/tensor_fragments
830570e8
stas00
stas00 commented on 2023-01-23
tjruwase Update deepspeed/utils/tensor_fragment.py
6acbd537
stas00
stas00 commented on 2023-01-23
tjruwase Update deepspeed/utils/tensor_fragment.py
ea4f2098
tjruwase Merge branch 'master' into olruwase/tensor_fragments
4b5373cc
stas00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
026ff96d
stas00
tjruwase Support offload
9bd542c1
tjruwase Support offload
8a240260
tjruwase Merge branch 'master' into olruwase/tensor_fragments
b4b0f9f4
tjruwase Support multi-gpu
864b4c95
tjruwase Cleanup
43a0aed0
stas00
tjruwase WIP
c2730cc1
stas00
stas00 commented on 2023-01-26
stas00
stas00
stas00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
923b2249
tjruwase Update deepspeed/runtime/zero/stage3.py
d1492458
tjruwase tjruwase changed the title Enable tensor fragments for zero 2 Enable tensor fragments for zero 2 & 3 2 years ago
tjruwase Master rebase
5837d162
tjruwase Merge branch 'master' into olruwase/tensor_fragments
187d166b
tjruwase Support padding
0f636c8b
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
5cfc6cc1
stas00
stas00 commented on 2023-01-26
tjruwase Update deepspeed/runtime/zero/stage3.py
458bf504
tjruwase z3 optimizer state support; aligned api
4738aacd
tjruwase Merge branch 'master' into olruwase/tensor_fragments
730c4590
tjruwase Support frozen z3 params
0f81194d
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
d15a7862
stas00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
ad10c372
tjruwase
stas00
tjruwase Unit tests
59dad7ce
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
b903e631
tjruwase Merge branch 'master' into olruwase/tensor_fragments
8034117d
tjruwase Check NVMe offload capability
448fa96b
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
0c6ccdc5
tjruwase Formatting
f02e3be4
tjruwase Merge branch 'master' into olruwase/tensor_fragments
6cced794
tjruwase Docs
d2ca3e00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
522d5dd4
tjruwase More docs
6c1217aa
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
686d18c0
tjruwase
tjruwase More docs
53022b0f
tjruwase Merge branch 'master' into olruwase/tensor_fragments
b06ec44d
stas00
stas00 commented on 2023-02-03
tjruwase Update docs/code-docs/source/zero3.rst
6d632cb7
stas00
stas00 commented on 2023-02-03
tjruwase More docs
d0c99612
tjruwase Update docs/code-docs/source/zero3.rst
9a1812ad
tjruwase More docs
a81a5435
stas00
stas00 commented on 2023-02-03
tjruwase More docs
7303e2cd
stas00
stas00 commented on 2023-02-03
tjruwase Update docs/code-docs/source/zero3.rst
2d5bdd6b
stas00
stas00 commented on 2023-02-04
tjruwase Update deepspeed/utils/tensor_fragment.py
e3fc9979
tjruwase Merge branch 'master' into olruwase/tensor_fragments
20984e42
tjruwase More docs
03d9f3fc
tjruwase Merge branch 'master' into olruwase/tensor_fragments
3d68746e
tjruwase Support unsharded fp32 grad
cb009987
tjruwase Merge branch 'master' into olruwase/tensor_fragments
e9aa4682
tjruwase Merge branch 'master' into olruwase/tensor_fragments
d45782ac
tjruwase Remove debug prints
7fa2010c
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
71238a4a
jeffra
jeffra approved these changes on 2023-02-07
tjruwase Merge branch 'master' into olruwase/tensor_fragments
dd55ec1b
tjruwase Merge branch 'master' into olruwase/tensor_fragments
b107afc2
tjruwase Fix off-by-one detection of empty grads
e45c5683
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
94f1716a
tjruwase Merge branch 'master' into olruwase/tensor_fragments
e5d3b54b
stas00
stas00 commented on 2023-02-09
stas00
stas00 commented on 2023-02-09
stas00
stas00 commented on 2023-02-09
stas00
stas00 commented on 2023-02-09
stas00
stas00 commented on 2023-02-09
tjruwase Update deepspeed/utils/tensor_fragment.py
e68caab5
tjruwase Update deepspeed/utils/tensor_fragment.py
7e2cbca9
tjruwase Update deepspeed/utils/tensor_fragment.py
bb4a8d06
tjruwase Update deepspeed/runtime/zero/stage3.py
8acd7fb7
tjruwase Merge branch 'master' into olruwase/tensor_fragments
b8d7b878
SaulLu
stas00
SaulLu
tjruwase Merge branch 'master' into olruwase/tensor_fragments
5f96ddd7
stas00
stas00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
1303bdd5
tjruwase Merge branch 'master' into olruwase/tensor_fragments
8a9c512d
tjruwase Merge branch 'master' into olruwase/tensor_fragments
d3e138ed
tjruwase Merge branch 'master' into olruwase/tensor_fragments
87797432
tjruwase Merge branch 'master' into olruwase/tensor_fragments
4fb6b0c4
tjruwase Merge branch 'master' into olruwase/tensor_fragments
0895e3c1
mrwyattii Merge branch 'master' into olruwase/tensor_fragments
137e325e
tjruwase Merge branch 'master' into olruwase/tensor_fragments
97586dcf
tjruwase
tjruwase commented on 2023-02-22
tjruwase Fix off-by-one error
4d1c9920
tjruwase Merge branch 'master' into olruwase/tensor_fragments
1b6003bd
tjruwase Skip ranks with no gradient data
0d3c8da9
tjruwase Formatting
3d40f64a
tjruwase Merge branch 'olruwase/tensor_fragments' of github.com:microsoft/Deep…
6ea17899
tjruwase Merge branch 'master' into olruwase/tensor_fragments
1d9174c7
tjruwase Merge branch 'master' into olruwase/tensor_fragments
f1efeca1
tjruwase Merge branch 'master' into olruwase/tensor_fragments
a56da9e8
tjruwase Merge branch 'master' into olruwase/tensor_fragments
9bfee830
stas00
tjruwase Merge branch 'master' into olruwase/tensor_fragments
5b790733
tjruwase Add license
4e87e1bb
tjruwase
stas00
stas00
stas00
tjruwase Fix license
84450f3a
tjruwase
tjruwase tjruwase merged 541e423a into master 2 years ago
mrwyattii mrwyattii deleted the olruwase/tensor_fragments branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone