[FSDP] Add no_sync() context manager (#72446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72446
**Overview**
This addresses https://github.com/pytorch/pytorch/issues/72183 and upstreams the `no_sync()` context manager for `FullyShardedDataParallel`.
**Test Plan**
- `_test_no_sync()` is generalized from Fairscale (see [here](https://github.com/facebookresearch/fairscale/blob/89e1ae5f1631c6e038268240f92084dc2f6f2dd8/tests/nn/data_parallel/test_fsdp_grad_acc.py#L66)).
- `test_communication()` is generalized from Fairscale (see [here](https://github.com/facebookresearch/fairscale/blob/89e1ae5f1631c6e038268240f92084dc2f6f2dd8/tests/nn/data_parallel/test_fsdp_grad_acc.py#L128)).
I tested for world sizes of 2 and 4 on the AWS cluster:
```
gpurun python test/distributed/fsdp/test_fsdp_no_sync.py
gpurun4 python test/distributed/fsdp/test_fsdp_no_sync.py
gpurun python test/distributed/fsdp/test_fsdp_comm.py
gpurun4 python test/distributed/fsdp/test_fsdp_comm.py
```
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision: D34085750
Pulled By: awgu
fbshipit-source-id: 8b492d8e941049a7f5ae211f3bb4042a57f5c217
(cherry picked from commit e14f1dce1a43c6a5389e534a8a176fc39ddb7396)