[FSDP][2/N] Remove `params_with_grad` (#87480)
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.
This PR is technically BC-breaking. However, I checked that no one internally is using this API.
cc @ezyang @gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87480
Approved by: https://github.com/rohan-varma