DeepSpeed
[model weights] zero_to_fp32 multiple improvements
#1181
Merged

[model weights] zero_to_fp32 multiple improvements #1181

tjruwase merged 21 commits into deepspeedai:master from stas00:z2fp32-auto
stas00
stas00 add live zero checkpoint to fp32 consolidation version
1ff9e2ea
stas00 stas00 requested a review from arashashari arashashari 4 years ago
stas00 stas00 requested a review from awan-10 awan-10 4 years ago
stas00 stas00 requested a review from cli99 cli99 4 years ago
stas00 stas00 requested a review from conglongli conglongli 4 years ago
stas00 stas00 requested a review from eltonzheng eltonzheng 4 years ago
stas00 stas00 requested a review from jeffra jeffra 4 years ago
stas00 stas00 requested a review from minjiaz minjiaz 4 years ago
stas00 stas00 requested a review from niumanar niumanar 4 years ago
stas00 stas00 requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 4 years ago
stas00 stas00 requested a review from samyam samyam 4 years ago
stas00 stas00 requested a review from ShadenSmith ShadenSmith 4 years ago
stas00 stas00 requested a review from tjruwase tjruwase 4 years ago
stas00 some more docs
ad578e1c
piegu
stas00
piegu
stas00
stas00
piegu
stas00
stas00 zero2 model states uses a different filename
1acd0741
stas00
piegu
stas00 fix
3f25d286
piegu
stas00
stas00
stas00 Merge remote-tracking branch 'origin/master' into z2fp32-auto
3b3282fb
stas00
stas00 make debug mode cli configurable
6bef851a
stas00
stas00 copy the script only on node 0 process 0
440e298d
stas00 validate that we have the right number of files
673f37b7
stas00 revamp _get_zero_param_shapes, instrument with easier debug
17d6a200
stas00 stas00 changed the title [WIP] [model weights] add live zero checkpoint to fp32 consolidation version [WIP] [model weights] zero_to_fp32 multiple improvements 4 years ago
stas00
stas00 correct assertion
88e48820
stas00 rename API; add even simpler API
50d42c39
stas00
stas00 style
12e61a0f
stas00 docs improve
de79132f
stas00 stas00 changed the title [WIP] [model weights] zero_to_fp32 multiple improvements [model weights] zero_to_fp32 multiple improvements 4 years ago
stas00 update the docs
24476cf1
tjruwase Merge branch 'master' into z2fp32-auto
c86be9aa
stas00
stas00 commented on 2021-07-09
tjruwase Merge branch 'master' into z2fp32-auto
99d25fb3
stas00
stas00 commented on 2021-07-09
stas00 Merge remote-tracking branch 'origin/master' into z2fp32-auto
ee7d6a7a
tjruwase Merge branch 'master' into z2fp32-auto
6bc3ce21
tjruwase Merge branch 'master' into z2fp32-auto
4b52e401
tjruwase
tjruwase approved these changes on 2021-07-12
stas00 revert the unpartitioned_params detection and report as it's most lik…
70adb32a
tjruwase Merge branch 'master' into z2fp32-auto
0dd33e9d
tjruwase tjruwase merged 2a921069 into master 4 years ago
stas00 stas00 deleted the z2fp32-auto branch 4 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone