Add FSDP to ddp/dynamo benchmarks (#1318)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1318
- Make apply_trainer configurable (by providing your own implementation). This is needed because you might want to benchmark different configurations of FSDP, or configure differently per model. For example, see the T5_large or timm_vision_transformer FSDP wrapping functions in this PR.
- Make the ddp script work for DDP _and_ FSDP
- Add T5 and timm_vision_transformer_large for FSDP
- Monitor max memory in the distributed trainer
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D41475962
Pulled By: davidberard98
fbshipit-source-id: 8c6463a604472047236a547f9e7fd40e78e510ee