DeepSpeed
b8fb9c3f - parallelize writing of layer checkpoint files across data parallel instances (#1419)

Commit
3 years ago
parallelize writing of layer checkpoint files across data parallel instances (#1419) * parallelize layer checkpoints across data parallel groups * use partition_uniform to determine start/end index values * formatting fix * config: add option for parallel write of layer checkpoints in pipeline stage * yapf fixes * enable parallel layer write according to config param * avoid extraneous makedir when rank 0 writes all layers Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading