DeepSpeed
Big science fix passing multiple tensors
#1400
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
45
Changes
View On
GitHub
Big science fix passing multiple tensors
#1400
jeffra
merged 45 commits into
deepspeedai:master
from
thomasw21:big-science-fix-passing-multiple-tensors
Minor tweaks to support Megatron 2.4 + DS 3D
db017fd7
pipe partitioning
407ff0f1
re-enable grad buffer partitioning
a096d32d
Avoid partitioning small activations
9b4093b3
Merge pull request #4 from ShadenSmith/olruwase/partition_activation
182be7b5
send/recv
3e948df1
isend/irecv missing wait
b6a2cb37
turn off async ops
6bb63b85
Merge branch 'megatron2.4-3d-sendrecv' into megatron2.4-3d
80976905
less verbose load
bd9e9539
Merge branch 'master' into megatron2.4-3d
081ddb5f
added shaden's set_train_batch_size patches, plus formatting
d26c258b
Adds engine.was_step_applied() (#1251)
9dbfdbd4
Cleaning up tensor/pipe parallel accounting. (#1252)
d6945dea
Correctness fix PP+ZeRO for gradient accumulation + updates from mast…
f93e22b3
dont clear grads in stage 1 code path
e9b5dffa
prevent none grads from being reduced
4b354096
fix empty grad zero tests
bc17042a
Use mpu in DeepSpeedConfig() call (#1271)
6b428821
API for obtaining global gradient norm (#1292)
cce85b89
turn excessive noise off (#1293)
e65e511b
[zero] restore fp16 params if no zero ckpts available (#1322)
db2f8a03
Fix PP checkpoint bloat (#1324)
72ce55ab
update for cuda-11.4 (#1329)
c7f3bc51
Try something out
ddaa4061
Woops
b57a10bb
Make deepspeed pass any types of dtypes between stages
a7cca980
Woops
2c5d1e4d
Woops 2
d6f7b006
Woops 3
33e24717
Try debugging deadlock
4d1b0098
Fix dtype
ab64b540
Fix some more things
8c29337a
Woops
e34994ed
Use list comprehension instead of for loops, and increase the number …
2c55a32f
thomasw21
force pushed
to
2c55a32f
4 years ago
thomasw21
changed the title
WIP: Big science fix passing multiple tensors
Big science fix passing multiple tensors
4 years ago
thomasw21
marked this pull request as ready for review
4 years ago
thomasw21
requested a review
from
awan-10
4 years ago
thomasw21
requested a review
from
cli99
4 years ago
thomasw21
requested a review
from
conglongli
4 years ago
thomasw21
requested a review
from
eltonzheng
4 years ago
thomasw21
requested a review
from
jeffra
4 years ago
thomasw21
requested a review
from
minjiaz
4 years ago
thomasw21
requested a review
from
niumanar
4 years ago
thomasw21
requested a review
from
RezaYazdaniAminabadi
4 years ago
thomasw21
requested a review
from
samyam
4 years ago
thomasw21
requested a review
from
ShadenSmith
4 years ago
thomasw21
requested a review
from
tjruwase
4 years ago
stas00
approved these changes on 2021-10-05
Run pre-commit
456c49af
Use ValueError + error msg instead of NotImplemetedError
c1876455
thomasw21
changed the base branch from
big-science
to
master
4 years ago
thomasw21
changed the base branch from
master
to
big-science
4 years ago
Merge remote-tracking branch 'origin/master' into big-science-fix-pas…
48dc9133
thomasw21
changed the base branch from
big-science
to
master
4 years ago
Use tuples instead of lists
7158a210
Merge branch 'master' into big-science-fix-passing-multiple-tensors
e2c875a5
Merge branch 'master' into big-science-fix-passing-multiple-tensors
0d8daac0
Make sure to set as input a tensor when required, instead of a tuple …
673a3267
Merge branch 'big-science-fix-passing-multiple-tensors' of github.com…
1c3dee52
Update inputs as well
8060021b
Merge branch 'master' into big-science-fix-passing-multiple-tensors
19de4aa0
ShadenSmith
approved these changes on 2021-10-07
jeffra
merged
9c672783
into master
4 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
stas00
ShadenSmith
hyunwoongko
awan-10
cli99
conglongli
eltonzheng
jeffra
minjiaz
niumanar
RezaYazdaniAminabadi
samyam
tjruwase
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub