DeepSpeed
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load
#1525
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load
#1525
jeffra
merged 15 commits into
master
from
zero-ckpt-cpu-issue
jeffra
requested a review
from
awan-10
4 years ago
jeffra
requested a review
from
cli99
4 years ago
jeffra
requested a review
from
conglongli
4 years ago
jeffra
requested a review
from
eltonzheng
4 years ago
jeffra
requested a review
from
minjiaz
4 years ago
jeffra
requested a review
from
niumanar
4 years ago
jeffra
requested a review
from
RezaYazdaniAminabadi
4 years ago
jeffra
requested a review
from
samyam
4 years ago
jeffra
requested a review
from
ShadenSmith
4 years ago
jeffra
requested a review
from
tjruwase
4 years ago
tjruwase
commented on 2021-11-05
tjruwase
approved these changes on 2021-11-05
jeffra
changed the title
Reduce CPU memory overhead during ZeRO checkpoint loading
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load
4 years ago
jeffra
force pushed
to
45a416e1
4 years ago
[squash] zero-ckpt-cpu-issue (#1673)
0fc11fa0
formatting
dbd08236
jeffra
force pushed
from
09260b6c
to
dbd08236
3 years ago
Merge branch 'master' into zero-ckpt-cpu-issue
92d87f0c
Reduce cpu memory of loading in rigid mode
a6b6770f
Merge branch 'master' into zero-ckpt-cpu-issue
21e173be
Allocate tensor on param device
cd4ce852
Merge branch 'zero-ckpt-cpu-issue' of github.com:microsoft/DeepSpeed …
4b0d366a
tjruwase
commented on 2022-01-06
jeffra
commented on 2022-01-07
Merge branch 'master' into zero-ckpt-cpu-issue
571b0a2c
add WS check + several unit tests for ckpting (TODO: need to fix a fe…
a4b40fa7
uncomment exception check in ckpt test
64975092
Merge branch 'master' into zero-ckpt-cpu-issue
477dc89c
Merge branch 'master' into zero-ckpt-cpu-issue
61bdfece
Merge branch 'master' into zero-ckpt-cpu-issue
c13305f0
fixes for remaining unit tests
091071de
Merge branch 'master' into zero-ckpt-cpu-issue
4add9306
jeffra
enabled auto-merge (squash)
3 years ago
disabled auto-merge
3 years ago
Manually disabled by user
jeffra
merged
3293cf72
into master
3 years ago
jeffra
deleted the zero-ckpt-cpu-issue branch
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
tjruwase
awan-10
cli99
conglongli
eltonzheng
minjiaz
niumanar
RezaYazdaniAminabadi
samyam
ShadenSmith
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub