DeepSpeed
Big science fix passing multiple tensors
#1400
Merged

Big science fix passing multiple tensors #1400

thomasw21
Minor tweaks to support Megatron 2.4 + DS 3D
db017fd7
pipe partitioning
407ff0f1
re-enable grad buffer partitioning
a096d32d
tjruwase Avoid partitioning small activations
9b4093b3
tjruwase Merge pull request #4 from ShadenSmith/olruwase/partition_activation
182be7b5
send/recv
3e948df1
isend/irecv missing wait
b6a2cb37
turn off async ops
6bb63b85
Merge branch 'megatron2.4-3d-sendrecv' into megatron2.4-3d
80976905
less verbose load
bd9e9539
jeffra Merge branch 'master' into megatron2.4-3d
081ddb5f
jeffra added shaden's set_train_batch_size patches, plus formatting
d26c258b
Adds engine.was_step_applied() (#1251)
9dbfdbd4
Cleaning up tensor/pipe parallel accounting. (#1252)
d6945dea
jeffra Correctness fix PP+ZeRO for gradient accumulation + updates from mast…
f93e22b3
jeffra dont clear grads in stage 1 code path
e9b5dffa
jeffra prevent none grads from being reduced
4b354096
jeffra fix empty grad zero tests
bc17042a
tjruwase Use mpu in DeepSpeedConfig() call (#1271)
6b428821
tjruwase API for obtaining global gradient norm (#1292)
cce85b89
stas00 turn excessive noise off (#1293)
e65e511b
jeffra [zero] restore fp16 params if no zero ckpts available (#1322)
db2f8a03
tjruwase Fix PP checkpoint bloat (#1324)
72ce55ab
stas00 update for cuda-11.4 (#1329)
c7f3bc51
thomasw21 Try something out
ddaa4061
thomasw21 Woops
b57a10bb
thomasw21 Make deepspeed pass any types of dtypes between stages
a7cca980
thomasw21 Woops
2c5d1e4d
thomasw21 Woops 2
d6f7b006
thomasw21 Woops 3
33e24717
thomasw21 Try debugging deadlock
4d1b0098
ghost
thomasw21 Fix dtype
ab64b540
hyunwoongko
thomasw21
Fix some more things
8c29337a
Woops
e34994ed
thomasw21 Use list comprehension instead of for loops, and increase the number …
2c55a32f
thomasw21 thomasw21 force pushed to 2c55a32f 4 years ago
thomasw21 thomasw21 changed the title WIP: Big science fix passing multiple tensors Big science fix passing multiple tensors 4 years ago
thomasw21 thomasw21 marked this pull request as ready for review 4 years ago
thomasw21 thomasw21 requested a review from awan-10 awan-10 4 years ago
thomasw21 thomasw21 requested a review from cli99 cli99 4 years ago
thomasw21 thomasw21 requested a review from conglongli conglongli 4 years ago
thomasw21 thomasw21 requested a review from eltonzheng eltonzheng 4 years ago
thomasw21 thomasw21 requested a review from jeffra jeffra 4 years ago
thomasw21 thomasw21 requested a review from minjiaz minjiaz 4 years ago
thomasw21 thomasw21 requested a review from niumanar niumanar 4 years ago
thomasw21 thomasw21 requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 4 years ago
thomasw21 thomasw21 requested a review from samyam samyam 4 years ago
thomasw21 thomasw21 requested a review from ShadenSmith ShadenSmith 4 years ago
thomasw21 thomasw21 requested a review from tjruwase tjruwase 4 years ago
stas00
stas00
stas00 approved these changes on 2021-10-05
stas00
stas00
hyunwoongko
hyunwoongko
hyunwoongko
hyunwoongko
thomasw21
hyunwoongko
thomasw21 Run pre-commit
456c49af
thomasw21 Use ValueError + error msg instead of NotImplemetedError
c1876455
thomasw21 thomasw21 changed the base branch from big-science to master 4 years ago
thomasw21 thomasw21 changed the base branch from master to big-science 4 years ago
thomasw21 Merge remote-tracking branch 'origin/master' into big-science-fix-pas…
48dc9133
thomasw21 thomasw21 changed the base branch from big-science to master 4 years ago
thomasw21
thomasw21 Use tuples instead of lists
7158a210
jeffra Merge branch 'master' into big-science-fix-passing-multiple-tensors
e2c875a5
tjruwase Merge branch 'master' into big-science-fix-passing-multiple-tensors
0d8daac0
stas00
thomasw21 Make sure to set as input a tensor when required, instead of a tuple …
673a3267
thomasw21 Merge branch 'big-science-fix-passing-multiple-tensors' of github.com…
1c3dee52
thomasw21 Update inputs as well
8060021b
thomasw21
stas00
tjruwase Merge branch 'master' into big-science-fix-passing-multiple-tensors
19de4aa0
ShadenSmith
ShadenSmith approved these changes on 2021-10-07
jeffra jeffra merged 9c672783 into master 4 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone