Small fixes to run.py (#495)
Summary:
Add a synchronize in --profile path to match the run_one_step()
timed run. Also this prevents warmup kernel execute from showing
in traces. So the trace will clearly show a single run. Add a
third timestamp to differentiate cpu dispatch time and cpu wall
time, which are different.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/495
Reviewed By: eircfb
Differential Revision: D31518258
Pulled By: aaronenyeshi
fbshipit-source-id: 10d8144843886961af8437d09ff08a94b710588d