parallelize writing of layer checkpoint files across data parallel instances #1419
adammoody
force pushed
from
1ac98950
to
9fbeb42f
4 years ago
adammoody
changed the title WIP: parallelize layer checkpoints across data parallel instances parallelize writing of layer checkpoint files across data parallel instances 4 years ago
adammoody
force pushed
from
9fbeb42f
to
1cee52dd
3 years ago
parallelize layer checkpoints across data parallel groups
8fef9f6c
adammoody
force pushed
from
1cee52dd
to
8fef9f6c
3 years ago
use partition_uniform to determine start/end index values
c64a7d4e
formatting fix
6f8c9d1e
Merge branch 'master' into layerckpt
fa99397f
Merge branch 'master' into layerckpt
f05dc913
config: add option for parallel write of layer checkpoints in pipelin…
ed8bc48e
adammoody
force pushed
from
e6a45fd6
to
ed8bc48e
3 years ago
yapf fixes
92f6a840
enable parallel layer write according to config param
2dbf0a4f
avoid extraneous makedir when rank 0 writes all layers
2f311e99
Merge branch 'master' into layerckpt
27002cfa
tjruwase
approved these changes
on 2022-10-10
Merge branch 'master' into layerckpt
f54324a5
Merge branch 'master' into layerckpt
6d5518b3
Merge branch 'master' into layerckpt
0e4d92b7
tjruwase
merged
b8fb9c3f
into master 3 years ago
adammoody
deleted the layerckpt branch 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub