Force a sync on non-CPU tensors for the benchmark to reflect the timing accurately. (#48856)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48856
Test Plan: Imported from OSS
Reviewed By: IvanKobzarev
Differential Revision: D25339803
Pulled By: AshkanAliabadi
fbshipit-source-id: fdfd9a0e0cc37245d7671419f492e445396fbdb8