pytorch
a749180e - Enable ncclAvg for reductions (#62303)

Commit
3 years ago
Enable ncclAvg for reductions (#62303) Summary: [ncclAvg](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html?highlight=ncclavg#c.ncclAvg) is a new `ncclRedOpt_t` that fuses a div-by-world-size with ncclAllReduce, Reduce, or ReduceScatter. This PR adds support. This PR and https://github.com/pytorch/pytorch/pull/62140 lay the foundation for to DDP allreduce+average grad tensors in place with a single nccl call without additional memory pass(es) to flatten or average or unflatten. I'll write the necessary DDP changes once this PR and https://github.com/pytorch/pytorch/pull/62140 land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62303 Reviewed By: soulitzer Differential Revision: D30095246 Pulled By: rohan-varma fbshipit-source-id: d3a3475345fafb0ab265c11d36db74d7c5613a0a
Author
Parents
Loading