[FSDP] Fix `no_sync()`, `use_orig_params=True`, mixed precision, sharded (#92874)
When there is an original parameter with 1D shape that is fully assigned to one rank, then its `param.shape == view.shape` in `_use_unsharded_grad_views()`. In that case, we still want to check whether `param.dtype == view.dtype` and bypass as necessary.
The previous PR had an additional `and not self.uses_sharded_strategy` because the unit test did not require the check for sharded strategies, and I was conservatively adding a minimal fix. This was happenstance and because there was no 1D parameter fully assigned to one rank. Including the bias in the linear layer achieves that case, and removing the `and not self.uses_sharded_strategy` is necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92874
Approved by: https://github.com/zhaojuanmao