pytorch
a48f3059 - Corrected comments in fsdp (#80456)

Commit
2 years ago
Corrected comments in fsdp (#80456) Currently, pre- and post-division steps in `FullyShardedDataParallel._post_backward_hook` state the following: > Average grad by world_size for consistency with PyTorch DDP. This is not matching what is actually going on, i.e. pre-divide factor may be equal to `world_size` and may not. For example, for `world_size = 3 `, `predivide_factor=2` This PR clarifies pre- and post-division in the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/80456 Approved by: https://github.com/rohan-varma
Author
Olga Andreeva
Committer
Parents
Loading