Fix deepspeed for bert and t5 (#1181)
Summary:
Adding FSDP broke deepspeed because it required calls to accelerator.prepare() for model and other components to be separated. However, deepspeed requires all components to be prepared at the same time. Fix deepspeed by creating two paths.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1181
Reviewed By: xuzhao9
Differential Revision: D39529309
Pulled By: erichan1
fbshipit-source-id: e6be8502098211240e07fcd761f65907d8bf0742