DeepSpeed
64defe65 - Parallel map step for `DistributedDataAnalyzer` map-reduce (#5291)

Commit
1 year ago
Parallel map step for `DistributedDataAnalyzer` map-reduce (#5291) - adds multi CPU-processing to the `DistributedDataAnalyzer` map operation (parallelism set with parameter `num_workers`). Works with a `SharedMemory` / `Manager's` queue per metric, written concurrently by processes. - much faster `write_buffer_to_file` in `DistributedDataAnalyzer` reduce operation by copying to cpu and "detaching" output tensor. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Conglong Li <conglong.li@gmail.com>
Author
Parents
Loading