Fix run.py --profile options to support choosing device activity (#493)
Summary:
When combining CPU and GPU trace activities into a profile
trace, the output trace hides all the GPU gaps inside of the
CPU profiling overhead. We need to collect GPU trace only when
running in -d cuda mode.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/493
Reviewed By: xuzhao9
Differential Revision: D31509728
Pulled By: aaronenyeshi
fbshipit-source-id: 64d0ddf49c8814955f9c1db7ef89167881eeb511