[FSDP] Move the sharded_state_dict logic to the post hook to avoid OOM (#82613)
The original implementation put the call of `_summon_full_params()` in `state_dict()`. However, because `state_dict()` is recursive, `_summon_full_params()` will also behave like the recursive version even if recursive is set to False. This PR put the logic in the post hook to solve the OOM issue.
Differential Revision: [D38329396](https://our.internmc.facebook.com/intern/diff/D38329396/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82613
Approved by: https://github.com/rohan-varma