[FSDP2] Replaced version-ctx with `no_grad`; removed `no_grad` (#119550)
This PR replaces the `_unsafe_preserve_version_counters` context with a simple `torch.no_grad()` context instead. This decreases CPU overhead from (1 context enter/exit + `N` loop over tensors) with just (1 context enter/exit).
This PR also removes a `torch.no_grad()` from `init_unsharded_param` as it helps compiling but does not affect eager.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119550
Approved by: https://github.com/Skylion007