pytorch
c0ed0f22 - [FSDP] Fix `no_sync()`, `use_orig_params=True`, mixed precision, sharded (#92874)

Commit
2 years ago
[FSDP] Fix `no_sync()`, `use_orig_params=True`, mixed precision, sharded (#92874) When there is an original parameter with 1D shape that is fully assigned to one rank, then its `param.shape == view.shape` in `_use_unsharded_grad_views()`. In that case, we still want to check whether `param.dtype == view.dtype` and bypass as necessary. The previous PR had an additional `and not self.uses_sharded_strategy` because the unit test did not require the check for sharded strategies, and I was conservatively adding a minimal fix. This was happenstance and because there was no 1D parameter fully assigned to one rank. Including the bias in the linear layer achieves that case, and removing the `and not self.uses_sharded_strategy` is necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92874 Approved by: https://github.com/zhaojuanmao
Author
Committer
Parents
Loading