move inputs to device on root module only (#91078)
1. No need to move inputs/activations to devices for every nested FSDP instance
2. it also breaks the case when some nested FSDP instances have newly added inputs/activations in the signatures of submodules wrapped by nested FSDP instances, args_tuple[0] and kargs_tuple[0] are not correct to get the inputs/activations for these nested instances
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91078
Approved by: https://github.com/mrshenli, https://github.com/rohan-varma