[FSDP] Handle the state_dict on CPU cases (#85640)
state_dict may not be on GPUs. We need to move it to the compute_device in order to gather the ShardedTensor.
Differential Revision: [D39681730](https://our.internmc.facebook.com/intern/diff/D39681730/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85640
Approved by: https://github.com/rohan-varma, https://github.com/awgu