Fix FSDP device_id when CPU offloading (#82892)
See https://github.com/pytorch/pytorch/issues/82891 for full context.
When we init FSDP with device_id + CPU offload, we could potentially hit a crash when an outer FSDP unit does not manage any params. What was happening is that it would end up getting a flat param of a child FSDP module, check the device of this, see it is CPU, and throw an error.
The fix is to avoid this check if we hit a flat param. Also fixes up the documentation of the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82892
Approved by: https://github.com/awgu