parallelize writing of layer checkpoint files across data parallel instances (#1419)
* parallelize layer checkpoints across data parallel groups
* use partition_uniform to determine start/end index values
* formatting fix
* config: add option for parallel write of layer checkpoints in pipeline stage
* yapf fixes
* enable parallel layer write according to config param
* avoid extraneous makedir when rank 0 writes all layers
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>