[DDP] Add host-side time to CUDATimer (#62770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62770
Adding timing of forward, backward comp, backward comm, etc will help
detect desynchronization issues.
ghstack-source-id: 135195680
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30115585
fbshipit-source-id: 509bf341c5c92dcc63bdacd3c1e414da4eb4f321