Add distributed data parallel benchmark tool (#35198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35198
The need for this tool was motivated by #28883. In the past, we have
done ad-hoc benchmarking, but it's time for something more structured.
It would be nice to add more model architectures so that we can get a
full picture of the performance impact of a code change simply by
running this suite a few times.
Test Plan: Imported from OSS
Differential Revision: D20591296
Pulled By: mrshenli
fbshipit-source-id: ee66ce0ebca02086453b02df0a94fde27ab4be49