allow users to set sample rate and add per iteration latency breakdowns (#53145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145
add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g.
if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged)
ghstack-source-id: 123443369
Test Plan: unit test
Reviewed By: SciPioneer
Differential Revision: D26763957
fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e