[Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.
Test Plan: unit tests
Reviewed By: mrshenli
Differential Revision: D30521673
fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500